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The Journal of Consulting Psychology will 
accept Brief Reports of research studies in 
clinical psychology for early publication with- 
out expense to the author. The procedure is 
intended to permit the publication of soundly 
designed studies of specialized interest or lim- 
ited importance which cannot now be ac- 
cepted because of lack of space. Several pages 
in each issue will be devoted to Brief Reports, 
published in the order of their receipt with- 
out respect to the dates of receipt of the regu- 
lar articles. Most Brief Reports appear in the 
first or second issue to go to press following 
their final acceptance. 


An author who wishes to submit a Brief 
Report: 


1. Sends the Brief Report, limited to one printed 
page and prepared according to the specifications 
given below. 

2. Also sends to the Editor a full report of the re- 
search study, in sufficient detail to give a clear ac- 
count of its background, procedure, results, and con- 
clusions, which will be filed with the American 
Documentation Institute to insure indefinite avail- 
ability. 

3. Prepares at least 100 mimeographed copies of 
the full report, which the author will send without 
charge to all who request it as long as the supply 
lasts. 

4. Agrees not to submit the full report to another 
journal of general circulation. 


Specifications 


Brief Report. The Brief Report should give 
a clear, condensed summary of the procedure 
of the study and as full an account of the re- 
sults as space permits. 


Brief Reports 


To insure that the Brief Report will be no 
longer than one printed page, its typescript, 
including all matter except the title and the 
author’s lines, must not exceed 70 lines av- 
eraging 42 characters and spaces in length. 
Set the typewriter margins for short lines of 
42 characters, which are 3.5 inches long in 
elite typing, and 4.2 inches long in pica. 

The manuscript of the Brief Report must 
be double spaced throughout. Except for its 
short lines, it follows the standard style (1). 
Headings, tables, and references are avoided 
or, if essential, must be counted in the 70 
lines. Each Brief Report must be accom- 
panied by a footnote in the style below, 
which is typed on a separate sheet and not 
counted in the 70-line quota: * 


1An extended report of this study may be ob- 
tained without charge from John Doe, 300 Market 
St., Prospect 6, Mass. (giving the author’s full name 
and address), or for a fee from the American Docu- 
mentation Institute. Order Document No. —— from 
ADI Auxiliary Publications Project, Photoduplica- 
tion Service, Library of Congress, Washington 25, 
D. C., remitting in advance $— for microfilm or 
$— for photocopies. Make checks payable to Chief, 
Photoduplication Service, Library of Congress. 


Extended report. The full report is pre- 
pared in the style specified by the Publica- 
tion Manual (1), except that it may be typed 
with single spacing for economy in photo- 
duplication by the ADI. 
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Validation of the Lowenfeld Mosaic Test’ 


Monroe L. Levin 


Teachers College, Columbia University ? 


The study upon which this report is based 
was stimulated by personal experience with 
the Mosaic Test which had led to the impres- 
sion that it was a useful clinical instrument. 
Although the test has been applied to many 
clinical problems and has been endorsed by 
numerous articles, a critical review revealed 
the literature to be contradictory, diffuse, and 
inconsistent. Furthermore, such basic issues 
as scoring reliability and experimental vali- 
dation of hypotheses had never been ade- 
quately investigated. 


Method 


The double set of the Lowenfeld Mosaic 
Test was administered to all subjects accord- 
ing to the standardized instructions recom- 
mended for use in this country (18). All sub- 
jects were limited to a performance time of 
twenty minutes. The final Mosaics were pho- 
tographed on 35 mm. Type A Kodachrome 
Film yielding color transparencies which were 
readily examined by means of a slide pro- 
jector. 


Hypotheses 


Hypotheses were in the form of statements 
of expectancy which stated that a given Mo- 
saic sign would occur significantly more often 
in the experimental group being examined. 
Hypotheses were formulated which were ex- 
pected to differentiate between: (a) institu- 


1 Based upon a doctoral dissertation submitted to 
Columbia University in 1954. The author is indebted 
to professors L. F. Shaffer, E. J. Shoben, Jr., and H. 
Solomon for their assistance and guidance. The re- 
search was performed while the author was a Re- 
search Fellow of the National Institute of Mental 
Health, United States Public Health Service. 

2 Currently at Western State Hospital and Chris- 
tian County Mental Health Center, Hopkinsville, 
Kentucky. 
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tional and noninstitutional groups, (5) be- 
tween normals and all other subjects, (c) 
between “maladjusted” and normal subjects, 
(d) among six diagnostic groups, and (e) 
between subjects with high scores in the clus- 
ters obtained from the Wittenborn Psychi- 
atric Rating Scales and those with low scores 
in the same clusters. Hypotheses were also 
tested which had been suggested or inferred 
by Lowenfeld and Wertham but which had 
not been originally included in the study. 
Most of the hypotheses were based on the 
published Mosaic literature. 


Institutional and noninstitutional groups. The Mo- 
saic Test has been regarded as a Gestalt instrument 
which reflects the functioning level of the individu- 
al’s cognitive and emotional processes in concrete 
situations (16, 21). Since the incoherent Mosaic is 
by definition the most disorganized response to the 
test, it was anticipated on an a priori basis that in- 
coherent designs would occur much more frequently 
in institutional groups than in noninstitutional 
groups which are able to function in society, re- 
gardless of diagnostic considerations. It was there- 
fore hypothesized that when the institutional groups 
were compared with the noninstitutional groups, in- 
coherent designs would be found to occur signifi- 
cantly more often in the former group. 

Success and adjustment. It was anticipated that 
when normal and pathological groups were com- 
pared, unsuccessful designs would occur more fre- 
quently among the pathological groups since lack of 
success has generally been regarded as a gross sign 
of psychopathology (2, 6, 10, 12, 16, 17). It was 
also anticipated that the proportions of occurrence 
would be such that the sign could be used in mak- 
ing individual discriminations. 

“Maladjusted” vs. normal subjects. Subjects who 
were functioning in society were divided into “nor- 
mal” and “maladjusted” groups on the basis of 
scores on the Cornell Index (Form N2). Since the 
“maladjusted” group was heterogeneous, it was impos- 
sible to develop hypotheses which could have been 
expected to identify a majority. It was possible, 
however, to select Mosaic characteristics which had 
been regarded as being indicative of the presence of 
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a wide variety of emotional disturbances such as 
might be present among the “maladjusted” group. 

Twenty characteristics were selected. It was rea- 
soned that if the twenty signs have utility in dis- 
criminating between better and less well adjusted 
socially functioning individuals, they would occur 
significantly more often among the “maladjusted” 
group. The signs which were selected as hypotheses 
and the sources for each are given in Table 2. 

After the data were analyzed, it was decided to 
determine whether, as anticipated, “maladjusted” 
subjects produced more of the individual signs than 
did normals: i.e., whether the sum of the signs was 
discriminative. 

Diagnostic groups. In formulating hypotheses for 
the diagnostic groups, signs were combined to form 
a multiple hypothesis when they could be expected 
to occur in about the same proportion of the sample. 
Experience with the Mosaic Test was relied upon to 
construct final hypotheses which seemed most likely 
to yield positive results. The specific hypotheses and 
their sources are given in Table 3. The hypothesis 
that psychotic subjects would use fewer pieces than 
nonpsychotic subjects was also tested. 

Wittenborn cluster groups. Since the Mosaic litera- 
ture rarely differentiated between specific types of 
neurosis or psychosis, it was necessary to devise a 
procedure for assigning hypotheses to the Witten- 
born clusters. It was possible to assign some Mosaic 
signs of neurosis and psychosis to specific “neurotic” 
and “psychotic” cluster groups. Additional hypothe- 
ses were formulated by analyzing the Wittenborn 
Scales in terms of the Mosaic characteristics which 
the experimenter believed could be expected from 
individuals representing the extreme of behavior on 
which each of the scales is based. It was then pos- 
sible to obtain “Mosaic clusters” which were hypo- 
thetically directly related to the Wittenborn clusters. 
Table 4 lists each of the specific hypotheses and their 
sources. It was also hypothesized that subjects at 
the upper end (most pathology) of the depressed 
cluster group would have a slower reaction time 
than other groups. 

Lowenfeld’s and Wertham’s signs. Those signs of 
Lowenfeld and Wertham which had not been in- 
cluded as hypotheses or else had been modified dras- 
tically, but which were examined after the data had 
been gathered, are indicated in Tables 5 and 6, re- 
spectively. 


Subjects 


Since most of the Mosaic literature is con- 
cerned with descriptions of the performances 
of diagnostic groups, and statements about 
the usefulness of test characteristics for dif- 
ferential diagnosis, psychiatric diagnosis was 
the first criterion with which Mosaic Test per- 
formance was compared. In addition to the 
groups which were defined as “normal” and 
“maladjusted,” four institutionalized groups 
were included in the study. The institutional- 
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ized groups consisted of diagnosed neurotics, 
familial mental defectives, general paretics 
with psychoses, and schizophrenics. 

Because of the unreliability of psychiatric 
diagnosis, the Wittenborn Psychiatric Rating 
Scales were used to select groups for the sec- 
ond phase of the research. 

Selective criteria. All subjects were white 
males between the ages of twenty-one and 
fifty-nine. Fourteen of the normals, three of 
the “maladjusted,” four of the neurotic, seven 
of the paretic, and one of the schizophrenics 
were foreign born. 


All of the noninstitutional subjects were function- 
ing in society at the time of testing. They were 
drawn from a variety of residential and occupational 
settings in New York City. The group which was 
defined as normal consisted of those noninstitution- 
alized persons whose responses to the Cornell Index 
(Form N2) resulted in scores of twelve or less with 
no “stop” items, while the group defined as “mal- 
adjusted” consisted of individuals whose scores on 
the Cornell Index exceeded the assigned cutoff level. 
Persons whose indices revealed that they had his- 
tories of mental illness, ulcers, convulsions, asthma, 
drug or alcohol addiction, were dropped from the 
study, and those known to be in psychotherapy at 
the time of testing were also excluded. 

None of the institutionalized subjects had ever had 
a previous diagnosis which conflicted with the cur- 
rent one: none of the neurotics or defectives 
had been previously diagnosed as being psychotic. 
With the exception of some of the neurotics, there 
was no staff disagreement concerning the adequacy 
of the diagnoses. It was necessary to disregard staff 
differences and use the official diagnoses in order to 
obtain a sample of diagnosed neurotics. It is pos- 
sible that some of the neurotic subjects were in the 
initial stages of psychoses. 


e.g., 


Subjects in the general paretic group had 
had no history of mental illness prior to the 
onset of leutic symptoms, and none of the 
familial mental defectives manifested note- 
worthy symptoms of emotional disturbance. 
There were 52 normal, 14 “maladjusted,” 14 
neurotic, 34 defective, 29 paretic, and 35 
schizophrenic subjects. 

It was possible to obtain ratings of 35 
schizophrenic, 29 general paretic, and 10 neu- 
rotic subjects on the Wittenborn Psychiatric 
Rating Scales. Groups were selected to repre- 
sent the upper and lower extremes of each 
Wittenborn symptom cluster by means of 
scores on each of the nine clusters. Subjects 
at the lower end (least pathology) of each 





cluster were compared with subjects at the 
upper end (most pathology) of that cluster. 
Because of sampling difficulties, two sets of 
samples were selected from the upper ends of 
some of the Wittenborn clusters: a smaller 
sample in which the extremes of pathology 
were most marked, and a larger sample in 
which the extremes of pathology were some- 
what attenuated. 

Uncontrolled variables. It was not possible 
to control the variables of socioeconomic 
status, intelligence, age, and ethnic origin. 
Attempts were made to avoid overrepresenta- 
tion of any one socioeconomic group, al- 
though if Mosaic Test responses are pri- 
marily determined by personality character- 
istics, socioeconomic status should not be a 
major determiner of test responses. 

There has been a consensus that, after the 
Mental Age of eight years, age is a relatively 
unimportant factor in Mosaic performance 
(2, 14). Conversely, it might be stated that 
after MA 8-0, intelligence is a relatively un- 
important factor in Mosaic performance. It 
was decided to allow intelligence to remain 
uncontrolled because of the difficulty of ob- 
taining accurate measures from certain of the 
pathological samples. It seemed likely that 
intelligence would be a minor variable when 
compared with personality, particularly since 
sampling methods probably resulted in a 
wide range of intelligence levels among every 
group except the mentally defective one. 

Two auxiliary studies demonstrated that 
lack of control of the variables of age and 
ethnic origin did not seriously affect the re- 
sults (8). 


Scoring Method 


In a preliminary experiment, a method for 
scoring Mosaic characteristics was devised. 
Scoring of 29 Mosaics by three independ- 
ently trained judges and the author yielded 
proportions of agreement on Mosaic charac- 
teristics which ranged from .45 to 1.00. 
Chance alone would have yielded a ratio of 
.125 perfect agreements in any instance. Ad- 
ditionally, tests of concordance revealed that 
the judges never deviated from an arbitrary 
standard which required them to agree at 
least 90 per cent of the time. It was therefore 
concluded that the method for scoring Mo- 
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saic characteristics was teachable and re- 
sulted in high interjudge agreement. 

Steps were taken to eliminate bias in scor- 
ing of the Mosaics in the research proper. To 
determine whether bias had been successfully 
controlled, and to re-examine scoring reli- 
ability on the research data proper, a lay- 
man was trained to score Mosaics. She then 
scored the 20 Mosaics considered to have 
been most difficult to score. Percentages of 
agreement between the layman judge and 
the author ranged from 75 to 100 per cent 
on the hypothesized signs. In every instance 
the obtained agreements differed significantly 
from chance expectancy. It was concluded 
that bias in scoring had been successfully con- 
trolled and that the reliability of the devised 
method of Mosaic scoring was independently 
reconfirmed (8). 

Reliabilities of combinations of signs. Most 
of the hypotheses consisted of several signs 
used in combination, while the reliability data 
were based almost entirely on single Mo- 
saic signs. Although the multiple hypotheses 
should have lower scoring reliabilities than 
the individual signs, the scoring reliabilities 
which were obtained were so high that it was 
assumed that the combined signs would have 
satisfactory scoring reliabilities. 


Statistical Methods 


Frequencies of occurrence of hypothesized 
signs were often so small that minor changes 
in frequencies could have seriously modified 
the statistical conclusions. The 1 per cent 
level was set as the criterion of statistical 
significance to offset the effects of chance 
sampling fluctuations. 

Descriptive method. Values of \/y*/N were 
computed in those cases where subjects were 
not selected independently of each other. The 
method was appropriate for examination of 
results obtained when Wittenborn cluster 


Table 1 


Notation Used in Descriptive Analysis of Data 








Wittenborn Cluster Score 








Sign Low High 
Mosaic sign absent A B 
Mosaic sign present Cc D 
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Table 2 


Frequencies of Occurrence and Values of z for Twenty Signs of Maladjustment 
When Maladjusted and Normal Groups are Compared 














Maladjusted Normal 
Item References f f Z 
1. Edge 3, 7, 9, 10, 14, 16, 17, 19 3 12 —1.17 
2. Corner 7,9, 10 0 1 — 92 
3. Frame 1, 7, 9, 10, 14, 16, 17, 19 2 3 .96 
4. All-over 2, 3, 16, 17, 19 1 6 — 51 
5. Well-defined cross 9 0 3 —1.62 
6. Supersymmetry 2, 6, 16, 17 0 0 0 
7. Active C rejection plus emphasis 
on F 2, 16, 17 6 11 1.57 
8. Emph. on C without emph. on F 2, 16, 17 0 2 —1.32 
9. Unsuccessful 2, 6, 10, 16, 17 9 14 2.56 
10. 40% or more black in comb. with 
other Cs. No others preponderant 9, 14, 16, 17 1 0 1.79 
11. 40% or more black and blue in 
comb. with other Cs. No others 
preponderant 16, 17 5 13 78 
12. 40% or more red 16, 17 0 6 —2.3 
13. Serrated red edge 9 1 0 —1.79 
14. Superimposition of stones 2, 14 0 4 —1.89 
15. Attempted edging 2 0 1 — .92 
16. Winged designs 7. 9, 16, 17 0 3 —1.62 
17. Downward arrow 19 1 5 — 3 
18. Repetitive simple design 2, 16 0 0 0 
19. Formation of letters of alphabet — 0 3 —1.62 
20. Use of full 20’ with resultant un- 
successful design — 0 2 —1.32 





groups were examined. The notation used is 
indicated in Table 1. 


Vx2/N = V(A— D)?/(A+ D)N 
was the statistic which was used. The closer 
the value of \/y?/N to 1.00, the better the 
agreement between the Wittenborn Scales and 
the Mosaic Test in selecting individuals who 


showed and failed to show the clusters of be- 
havior which were being considered. 


Results 


Analyses of the data revealed that only one 
hypothesis functioned as anticipated, al- 
though some signs which had been expected 
to discriminate between one group and others 
did so in a few, rather than in all instances. 


Institutional and Noninstitutional Groups 


When the 66 socially functioning subjects 
were compared with the 112 institutionalized 
subjects it was found that incoherent Mosaics 
were produced by four of the former and 


twelve of the latter subjects. Arc sine analy- 
sis of the data yielded a z of — 1.08, indicat- 
ing that the data clearly failed to support 
the hypothesis being tested. 


Success and Adjustment 


In scoring for “success” both the intent of 
the subject and his final Mosaic production 
were scored. In all other instances in this 
study, only the final product was considered. 
Fourteen of the 52 normal subjects and 89 of 
the 126 less well adjusted and endowed sub- 
jects produced unsuccessful designs, resulting 
in a z of 5.67. Although the statistical data 
strongly supported the hypothesis that unsuc- 
cessful designs occur most frequently among 
poorly adjusted and poorly endowed subjects, 
the frequencies of occurrence were such that 
the sign would not be useful in making indi- 
vidual discriminations. Since success and lack 
of success have perfect negative correlation 
by definition, the statistical results also sup- 
ported the implicit hypothesis that successful 
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designs occur most frequently among well en- 
dowed, well adjusted subjects. 


“Maladjusted” vs. Normal Subjects 


The results obtained by comparing the per- 
formances of normal and “maladjusted” sub- 
jects are summarized in Table 2. None of the 
signs of maladjustment discriminated be- 
tween normal and “maladjusted” subjects at 
the 1 per cent level of significance. A subse- 
quent analysis of the data revealed that nor- 
mal subjects produced a mean of 1.73 signs 
of maladjustment, and “maladjusted” sub- 
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jects produced a mean of 2.07 such signs. 
Since the standard error of the difference be- 
tween means was .3 and ¢ had a resultant 
value of 1.12, it was concluded that the num- 
ber of signs produced by normal and “mal- 
adjusted” subjects did not differ significantly. 
The latter finding is in need of cross valida- 
tion, since the hypothesis was not formulated 
in advance. 


Diagnostic Groups 


Examination of Table 3 reveals that none 
of the hypotheses relating to the diagnostic 


Table 3 


Frequencies of Occurrence, Tests of Significance, and References for Hypothesized 


Signs Related to Diagnostic Groups 





Exp. Group 


Comparison Groups 














Sign References Neurotic Defect. Paretic Schiz. Maladj. Normal 
Corner 9, 10 f 0 0 1 0 0 1 
£ _ 0 —1.15 0 0 — 92 
Frame 9, 10, 16 f 1 1 0 1 2 3 
2 — 62 1.65 63 62 18 
Cross 9 f 0 1 1 2 0 3 
2 —_ — 1.08 —1.15 —1.53 0 —1.62 
40% or more black in comb. 9, 14, 17 f 0 1 0 3 1 0 
with other Cs. No others z — — 1.08 0 — 1.89 —1.42 0 
preponderant 
Serrated red edge 9, 16 f 0 0 2 I 1 0 
z — 0 —1.63 —1.49 -1.42 0 
Winged design 7,9, 19 f 1 0 0 0 0 3 
Zz — 1./ 1.65 1.71 1.42 18 
Defect. Neurotic Paretic Schiz. Maladj. Normal 
Unembellished fundamentals 9, 19 f 1 0 1 0 0 1 
‘ s — 1.08 —.13 1.42 1.08 2.98* 
Simple design 9,13,15,19 f 31 | 24 23 7 15 
Zz — 6.3* 1 1.04 3.05* 6.36* 
No. small, simple compact 9,13,15,16,17 /f 3 0 0 0 0 0 
designs Z — 1.9 2.38 2.5 1.9 2.73° 
Repetit. design of paired 9, 19 f 3 0 0 0 0 0 
stones z — 1.9 2.38 2.5 1.9 2.73* 
Enumerative use of color 16, 17 f 7 1 5 4 1 0 
Z _ 1.27 34 1.05 1.27 4.26* 





* Difference significant at 1% level. 
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Table 3—Continued 
Exp. Group Comparison Groups 
Sign References Paretic Neurotic Defect. Schiz. Maladj. Normal 
Unsuccessful, simple des.,C 2,12,16,17 f 11 0 12 8 6 6 
used indiscriminately z — 4.18* 21 1.36 —.32 2.87* 
Inappropriate use of shape 12,16,17 f 0 3 0 1 2 
z — 2.33 .63 3.03* Al 1.56 
Schiz. Neurotic Defect. Paretic Maladj. Normal 
Emph. on F with active C 2,6,16,17 f 12 3 13 16 6 11 
rejection Z — 92 —.34 1.22 — .56 1.34 
Supersymmetry 2, 16, 17 ¥ 1 1 1 0 0 
Z _ 1 2.12 1.9 2.71* 3.88* 
W. tray covered by complex 2 f 0 0 0 0 0 0 
incoherent des. begun within Z — 0 0 0 0 0 
10” 
Edging 2 : 0 0 0 0 0 1 
z — 0 0 0 0 — .88 
Superimposed stones 2 ty 6 0 1 4 0 
Z — 2.71° 2.12 36 a 1.32 
Repetit. simple des. with 16 f 0 0 0 1 0 0 
same shape repeated, varia- Z — 0 0 —1.5 0 0 
tions in C 
Abstract des. predominate 16, 17 f 3 5 9 10 3 9 
Z — —2.18 —2.02 —1.94 —1.17 —1.2 


* Difference significant at 1% level. 


groups functioned as predicted. The data 
failed to support the additional hypothesis 
that psychotic subjects would use fewer pieces 
than nonpsychotic subjects. Psychotic sub- 
jects used a mean of 33.3 pieces with a stand- 
ard deviation of 43.87; all nonpsychotic sub- 
jects used a mean of 28.45 pieces with a 
standard deviation of 23.39; and normal sub- 
jects used a mean of 32.64 pieces with a 
standard deviation of 40.36. It is therefore 
obvious that there were no significant differ- 
ences between the numbers of pieces used by 
psychotic, nonpsychotic, and normal subjects. 


Wittenborn Cluster Groups 


Table 4 summarizes the data relative to 
the Wittenborn cluster groups. Although ex- 
aminations of Columns 1 and 2 demonstrate 


that the Mosaic Test usually correctly identi- 
fied individuals who did not demonstrate the 
symptom correlates of the clusters, compari- 
sons of Column 3 and 4 show that the test 
rarely identified individuals who demonstrated 
the extreme behaviors associated with the 
Wittenborn clusters. Values of Vx, N com- 
puted from the data in Columns 1 and 4 in- 
dicate that the Mosaic Test and Wittenborn 
Scales were generally in good agreement in 
identifying pathological and nonpathological 
individuals, but this measure did not con- 
sider instances (Columns 2 and 3) where the 
Mosaic Test failed to function as anticipated. 
Were it possible to include all the data in the 
statistical computations, it is probable that 
tests of the null hypothesis would reveal that 
the Mosaic Test functioned no better than 
chance in identifying individuals on the ba- 








sis of absence or presence of the behavioral 
correlates of psychopathology. 

Comparisons of both sets of data in Table 
4 indicate that the results were relatively un- 
affected by changes in characteristics of the 
samples. The slight decreases in values of 
Vx2/N which resulted from lowering cutoff 
scores may have been due as much to in- 
creased sample size as to heightened similar- 
ity of samples, and are not the changes which 
would be anticipated if the Mosaic Test signs 
were highly sensitive. 

When the reaction times of the depressed 
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cluster group were compared with all other 
cluster groups and the normal and defective 
diagnostic groups, it was found that the 
ranges of reaction times among the several 
groups were such that individuals in every 
group overlapped with members of the de- 
pressed group. Reaction times of several 
groups were as slow or slower than those of 
the depressed group and when tests of sig- 
nificance could be conducted, no significant 
differences were found. The findings remained 
the same when Wittenborn cutoff scores were 
lowered to increase the sizes of samples at 


Table 4 


Occurrence of Hypothesized Mosaic Signs and Values of Vx? 


IN for Upper and 


Lower Ends of Wittenborn Cluster Groups * 














Lower End Upper End 


1. Sign 2. 





Cluster Sign 3. Sign 4. Sign ! 
Group Sign References Absent Present Absent Present Vx?/N 
Acute anxiety Edge design 12, 14, 16,17, 19 28 11 5 1 75 
(28) (11) (7) (3) (.69) 
Corner design 9, 10 28 1 6 0 9? 
(38) (1) (10) (0) 388 
40% or more black in 9,14 38 1 6 0 92 
comb. with other Cs. (38) (1) (10) (0) (.88) 
No others preponderant 
Conversion Frame design 10, 17, 19 36 1 12 0 86 
hysteria (36) (1) (12) 0) (.86) 
Projections absent or 1 32 5 11 1 77 
contained 32) (5) (11 1 (.77) 
Manic and Emphasis on C without 16, 17 61 2 20 2 1 
depressed emphasis on F (61) (2) (25 2 78 
states 
Manic state 40% or more red 2, 16, 17 45 2 4 0 94 
(45) (2 9 (0) (.9) 
Red projections make up 2, 16, 17 46 4 0 95 
50% or more of all pro- (46) (1) (8) (1) (.88) 
jections 
Depressed Slower reaction time than 2 
state other groupsf 
Emphasis on C without 16, 17 61 2 20 2 81 
emphasis on F (61) (2) (20) (2) (.81) 





* Data in parentheses obtained by lowering cutoffs at upper ends of clusters to obtain larger samples. 


Where both sets of 


data are same, it was not necessary (or in some instances, possible) to obtain a larger sample by lowering cutoffs. 


+ Data summarized in text of article. 
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Table 4—Continued* 
Lower End Upper End 
Cluster 1. Sign 2. Sign 3. Sign 4. Sign 
Group Sign References Absent Present Absent Present Vx2/N 
40% or more black plus 16, 17 12 4 13 5 .29 
blue in combination with (12) (4) (13) (5) (.29) 
other Cs. No others pre- 
ponderant 
Downward arrow 19 14 2 18 0 .64 
(14) 2) (18) (0) (.64) 
Phobic Use full 20’ to make co- 3 50 2 7 0 92 
compulsive herent unsucc. all-over (50) (2) (14) (0) (.87) 
design 
Schizophrenic W. tray covered by com- 2 11 0 12 0 .69 
excitement plex incoherent des. be- (11) 0) (12) (0) (.69) 
gun within 10” 
Paranoid Small slab 16, 17 38 0 4 0 95 
condition 38) 0) (7) (0) (.92) 
Formation of letters of 37 | 4 0 94 
alphabet (37 (1) (7) (0) (.91) 
Paranoid Supersymmetry 2, 16, 17 19 2 7 0 79 
schizophrenic (19 (2) (12) (1) (.69) 
Repetitive simple des. 16 20 7 0 85 
same shape repeated (20) (1) (13) (0) (.77) 
with variations in color 
Hebephrenic Simple incoherent design 2,5, 6, 16, 17 26 3 7 2 74 
schizophrenic (26 (3) (13) (3) (.63) 
Edging 2 29 0 9 0 87 
29 (0) (16) (0) (.8) 
Superimposition of stones 2 28 1 7 2 47 
(28 (1) (13) (3) (.67) 
* Data in parentheses obtained by lowering cutoffs at lower ends of clusters to obtain larger samples. Where both sets of 
data are came, it was not necessary (or in some instances, possible) to a larger sample by lowering cutoffs. 


the most pathological extremes of the cluster 
groups, and when subjects with reaction times 
atypical of their groups were not considered. 


Lowenfeld’s and Wertham’s Signs 


Examination of Tables 5 and 6 reveal that 
none of Lowenfeld’s or Wertham’s signs which 
had been omitted from the original research 
plan, or greatly modified for inclusion, func- 
tioned as anticipated. Since these signs were 
not given as hypotheses which had been for- 
mulated in advance, cross validation of the 
findings is indicated. 


Discussion 


The overwhelmingly negative results justify 
the conclusion that the Mosaic Test cannot be 
used in its present form as a tool for differ- 
ential diagnosis. It may be argued that many 
of the signs were modified for incorporation 
into the study and that a molecular rather 
than a global approach was used. Yet the 
signs suggested by Lowenfeld and Wertham 
which were tested separately from the major 
data also failed te function as anticipated. As 
regards a global approach to Mosaic analy- 





Table 5 
Lowenfeld’s Signs Not Used as Hypotheses 


Validation of the Lowenfeld Mosaic Test 





























Exp. Group Comparison Groups 
Sign References Neurotic Defect. Paretic Schiz. Maladj. Normal 
Unsuccessful 10 f 5 28 26 21 9 14 
z — —3.14* —3.17* — 1.56 —1.53 63 
Slab 10, 17 f 0 2 3 3 0 1 
z _ —1.55 —2.01* — 1.89 0 —.92 
Defect. Neurotic Paretic Schiz. Malad} Normal 
No. small, 9 f 8 0 0 4 0 0 
simple designs Z _ 3.19* 4 1.34 3.19* 4.58* 
* Difference significant at 1% level. 
Table 6 


Wertham’s Signs Not Used as Hypotheses 


























Exp. Group Comparison Groups 
Sign References Neurotic Defect. Paretic Schiz. Maladj. Normal 
40% or more black and 16, 17 f 3 13 3 15 5 13 
blue in comb. with other Z — —1.17 94 —1.48 — 84 —.28 
Cs. None other prepon- 
derant 
40% or more red 16 f 0 0 0 2 0 6 
z — 0 0 —1.53 0 2.3 
Defect. Neurotic Paretic Schiz. Maladj. Normal 
No. small des., each of 17 4 4 0 1 2 0 0 
one shape Z — .22 1.29 A9 .22 3.17° 
No. small, simple, com- 16, 17 f 0 0 0 0 0 0 
pact, successful Z — 0 0 0 0 0 
Schiz. Neurotic Defect Paretic Maladj. Normal 
Small slabs 16, 17 1 0 1 0 0 0 
z _ 1.09 0 1.36 1.09 1.56 
Emph. on form without 16, 17 f 20 21 11 5 18 
emph. on color Z — 1.38 A 1.54 1.38 2.07 
Use of yellow, white, or 16 7 1 1 0 2 1 
green alone z —- 1.09 0 1.36 —1,38 3 
Black and white used 16 f 0 0 0 0 1 
alone Z _ 0 0 0 0 — 1.26 





* Difference significant at 1% level. 
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sis, careful examination of the literature re- 
veals that in spite of statements to the con- 
trary, discrete molecular signs are those most 
frequently presented as having diagnostic 
significance. 

It seems likely that when the Mosaic Test 
is used by experts in a clinical setting, many 
cues which are not directly related to Mosaic 
performance are used. The source of these 
cues may be erroneously identified as the 
Mosaic Test, rather than the interview aspects 
of the testing situation. 


Summary 


This paper is a report of an experimental 
validation study of the Lowenfeld Mosaic 
Test. The method of group comparisons was 
used to determine whether there is justifica- 
tion for using the Mosaic Test in its present 
form as a clinical tool. Mosaic signs were 
compared to two criteria, psychiatric diag- 
nosis and scores on the Wittenborn Psychi- 
atric Rating Scales. 

The results of the study were sweepingly 
negative and it was concluded that there is 
no justification for continuing to use the 
Lowenfeld Mosaic Test in its present form. 
It was suggested that the discrepancies be- 
tween clinical impressions and experimental 
findings may be attributed to failure to recog- 
nize that diagnostic cues are obtained from 
interview aspects of the testing situation, 
rather than from the Mosaic Test itself. 


Received October 6, 1955. 
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Plausibility and Depth of Interpretation 


Seymour Fisher * 
Walter Reed Army Institute of Research 


One of the persistent problems in attempt- 
ing to study the psychotherapeutic process in 
a scientifically objective manner lies in the 
general looseness of psychiatric terminology. 
Terms such as “emotional insight,” “trans- 
ference,” “unconscious hostility”—to mention 
but a few—are difficult to translate into ac- 
ceptable operations by which they can be 
measured. And until satisfactory measure- 
ments are available, the meaning of a con- 
cept (i.e., its empirical interrelations with 
other variables) cannot be determined to any 
precise degree. 

“Interpretations,” both explicit and implicit, 
are believed to play a major role in many 
psychotherapies, particularly those which are 
analytically oriented. Operational definitions 
of interpretations have been fairly satisfac- 
tory (e.g., 5, 11, 12), in the sense that rea- 
sonably high interjudge agreement can be ob- 
tained in separating “interpretations” from 
other classes of therapist statements. The 
question of the “depth” of interpretative 
statements, however, has received relatively 
little research attention, although the con- 
cept is extremely prevalent in therapeutic 
literature. Studies by Collier (1, 2) and Har- 
way et al. (7) have made preliminary steps 
toward solving some of the problems in the 
measurement of depth, but the meaning of 
the term remains vague and obscure. 

The present study had two primary objec- 
tives: (a) to obtain further information con- 
cerning the measurement of depth when the 
ratings were made under fairly well-con- 
trolled conditions; (6) to make some intro- 





1The writer wishes to acknowledge the technical 
and statistical assistance of Irvin Rubinstein and 
Ella J. Wilcox. An abbreviated version of this study 
was presented at the Eastern Psychological Associa- 
tion, March, 1956. 


ductory exploration into the meaning of the 
term by comparing “depth” ratings with in- 
dependent ratings of the same interpretations 
based upon a concept of “plausibility.” It 
was hypothesized that judgments relating to 
depth of interpretation are implicitly derived 
from (and hence should be correlated with) 
the rater’s subjective estimate (i.e., predic- 
tion) of how “plausible” the interpretation is 
to the patient: deep interpretations will be 
considered more “implausible” than shallow 
interpretations. 


Method 


The basic procedure consisted of obtaining 
ratings of interpretations under a number of 
different conditions. In order to control cer- 
tain variables (which will be elaborated at a 
later point), the present design included the 
following aspects: (a) all interpretations were 
presented out of context, with the patient’s 
responses omitted; (4) the ratings were made 
from the framework of a single patient whose 
general personality structure was made known 
to the raters; and (c) the raters were asked 
to assume that all interpretations were made 
at a constant time period in therapy. 


Statements 


Sixty therapist statements were selected 
from (a) recorded protocols, (5) published 
interviews, and (c) some were constructed 
fictitiously. These statements were all consid- 
ered to be “interpretations” in accord with 
a definition adapted from Porter (11) and 
Snyder (12). In choosing the final 60 inter- 
pretations, an attempt was made to include 
samples which might be expected to cover 
the entire range of “depth.” 

A fictitious case history and Rorschach 
summary were composed, and the wording of 
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the interpretations was occasionally modified 
to “fit” the history. The interpretations were 
then presented to the judges to be rated 
within the framework of this hypothetical 
patient (although the judges were not in- 
formed that the patient was fictional). 


Rating Instructions 


Two sets of instructions were drawn up, 
one referring to “depth,” the other to “plau- 
sibility.” ? 


Depth. The instructions included the following di- 
rections to the judge: (a) he was requested to rate 
each of the 60 interpretations on a dimension of 
depth, “in the usual sense of the term”; no further 
definition was offered; (b) he was informed that all 
interpretations were made to the same patient, and 
that an abbreviated case history and Rorschach 
summary of the patient accompanied the list of in- 
terpretations; (c) he was asked to assume that all 
interpretations were given during the first three 
hours of therapy; (d) each interpretation was to 
be rated on a 7-point scale: a rating of 1 indicating 
an extremely shallow interpretation, “so shallow 
that one can hardly consider it to be interpretive in 
nature”; a rating of 7 indicating the “deepest pos- 
sible kind of interpretation that could be made to 
this patient.” 

Plausibility. These instructions were identical to 
the “depth” directions, except for the following: 
(a) the term “interpretation” was never used; in- 
stead, reference was always made to “therapist state- 
ments”; (6b) the judge was requested to rate each 
statement for its “plausibility from the patient’s point 
of view: Do these statements seem true or false to 
this particular patient ?”; (c) each statement was to 
be rated on a 7-point scale: a rating of 1 indicating 
a completely plausible statement; a rating of 7 in- 
dicating a completely implausible statement. 


Raters 


A total of 40 independent judges was used 
in this study. The judges were assigned to one 
of four groups depending upon the amount of 
therapy experience they reported and the set 
of rating instructions they received. The four 
groups were composed as follows: (a) depth 


2 Copies of the rating instructions, items, and case 
history along with complete tables of intercorrela- 
tion matrices for the four groups of raters and scat- 
ter diagrams of the correlations in Table 2 have been 
deposited with the American Documentation Insti- 
tute. Order Document No. 4813 from ADI Publica- 
tions Project, Photodupiication Service, Library of 
Congress, Washington 25, D. C., remitting in ad- 
vance $1.75 for microfilm or $2.50 for photocopies. 
Make checks payable to Chief, Photoduplication 
Service, Library of Congress. 


ratings were made by 10 psychiatrists, each 
of whom had a minimum of 500 hours of 
therapy experience; (5) depth ratings were 
made by 10 professional psychologists, each 
having a minimum of 150 hours of therapy 
experience; (c) plausibility ratings were made 
by a comparable group of 10 professional psy- 
chologists; (d) plausibility ratings were made 
by 10 psychology graduate students (first- 
and second-year students without the M.A. 
degree), none of whom had any therapy ex- 
perience. 

The experienced judges were deliberately 
selected from different geographical regions 
to minimize personal friendship and similar 
training among raters. 

It will be noted that no psychiatrists were 
asked to rate for “plausibility,” since it was 
considered highly probable that despite rat- 
ing instructions they would most likely still 
be rating for depth. Similarly, no inexperi- 
enced graduate students were asked to rate 
for “depth,” since one would not expect them 
to have any notion as to the meaning of the 
term. 

In Table 1 the salient characteristics of 
the four groups of raters are summarized. It 
is clear that the psychiatrists, psychologists, 
and graduate students differ primarily in re- 
spect to therapeutic experience, although in 
addition the graduate students tend to be 
somewhat younger than the professionals. 
Thus, it should be kept in mind that although 
the groups will hereafter be referred to as 
“psychiatrists,” “psychologists,” and “gradu- 
ate students,” differences among these groups 
should not be attributed too literally to the 
different roles; it is believed that, essentially, 


Table 1 


Median Age, Hours of Therapy Experience, and Hours 
of Personal Therapy of the Four Groups 
(N = 10) of Raters 








Hours of 
therapy Hours of 
experi- personal 





Raters Instructions Age ence therapy 
Psychiatrists depth 32.0 2,275.0 239.0 
Psychologists depth 32.5 490.0 25.0 
Psychologists plausibility 33.0 430.0 0.0 
Grad. students plausibility 25.0 0.0 0.0 
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these groups represent points along a dimen- 
sion of therapy-experience. 


Results 
Depth Measurements 


If there is any common meaning attached 
to the term “depth” one would expect to find 
significant intercorrelations among the judges 
who rated the 60 interpretations. The median 
r (N=45) for the 10 psychiatrists* was 
69 (Q; = .61, Qs = .74), and for the 10 
psychologists .78 (Q,; = .72, Qs = .83). Obvi- 
ously, then, the raters were employing com- 
mon criteria in judging the “depth” of the 
items. Since the distribution of intercorrela- 
tions within the psychiatrists and psycholo- 
gists was relatively homogeneous and sym- 
metrical, it was possible to compare the rat- 
ings of the two groups: for each of the 60 
items, the mean ratings of the 10 psycholo- 
gists were correlated with the mean ratings 
for the 10 psychiatrists. The relationship was 
a marked linear one, r = .96. The scatter dia- 
gram also suggested a difference in the means 
of the two distributions. The 60 observations, 
however, were not independent (the same 
judges were involved), precluding a direct 
statistical test of significance. Therefore, the 
mean of each rater’s 60 ratings was used as 
a total score, and a ¢ test was computed be- 
tween the two groups of raters. The mean 
rating for the psychiatrists was 3.67 (SD = 
.16); for the psychologists 4.15 (SD = .26). 
The ¢ of 4.67 is significant at the .001 level 
for 18 df, indicating that psychologists tend 
to attribute greater depth to interpretations 
than do psychiatrists. 


Plausibility Measurements 


Similar analyses were performed on the 
plausibility ratings done by 10 psychologists 
and 10 early graduate students. The median 
r for the psychologists was .63 (Q; = .56, 


8QOne psychiatrist, who met the criteria for in- 
clusion in the sample, but whose ratings were con- 
spicuously deviant from his colleagues’, was (after 
much consideration and debate) dropped from the 
analysis. Although he claimed that he had under- 
stood the instructions, it appeared that his judg- 
ments were decidedly atypical. His ratings corre- 
lated with his ten colleagues respectively as follows: 
— 4S, — 25, — 32, — 36, — 37, — 43, — 37, — 26, 
—- £0, — Ai. 
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Q; = .68), and for the naive raters .68 (Q, 
= .64, Qs = .74), signifying that these plau- 
sibility ratings can similarly be done with a 
considerable amount of interrater agreement. 
The median r’s for plausibility tend to be 
somewhat lower than the depth ratings, al- 
though the interdependence among the coeffi- 
cients does not permit a statistical test of the 
significance of the differences. It may well be 
that depth ratings can be made more “reli- 
ably” than plausibility ratings, although it is 
probably more parsimonious to assume (until 
replication or additional evidence to the con- 
trary) that these are chance differences. 
When the 60 mean ratings by the psy- 
chologists were plotted against the mean rat- 
ings of the graduate students, the Pearson r 
was .96. Here, too, there was the suggestion 
that the less experienced graduate students 
were rating items as more implausible than 
the psychologists’ ratings: the mean rating 
for the psychologists was 3.88 (SD = .40), 
for the graduate students 4.21 (SD = .33). 
The ¢ of 1.9 does not quite reach the .05 level 
of significance (two-tailed test) for 18 df. 


Depth and Plausibility 


If the plausibility ratings are compared to 
the depth ratings, the relationship should in- 
dicate to what extent these two concepts are 
measuring the same or different things. Hence, 
the mean plausibility ratings of the psycholo- 
gists were plotted against the mean depth rat- 
ings by psychologists. An r of .87 was ob- 
tained, suggesting that ratings based upon 
plausibility estimates are essentially the same 
as depth ratings. Between these two groups 
of raters, where experience was equated, the 
mean ratings on the scales (3.88 for plausi- 
bility, 4.15 for depth) did not differ signifi- 
cantly (¢ = 1.7). 

When the plausibility ratings by early 
graduate students were plotted against the 
depth ratings by psychiatrists, a similar re- 
lationship appeared: r = .86. 

Table 2 summarizes the intercorrelations 
among the four groups of raters mentioned 
above, including also the correlation between 
graduate students’ plausibility ratings and 
psychologists’ depth ratings (r = .88), and 
between psychologists’ plausibility ratings and 
psychiatrists’ depth ratings (r = .84). 
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Table 2 


Intercorrelation Matrix of Mean Ratings (V = 60) 
Among Four Groups of Raters 








Psychol- Grad. 
ogists students 
plausi- _plausi- 


Psychia- Psychol- 
trists ogists 








depth depth bility bility 

Psychiatrists 

depth _ .96 84 .86 
Psychologists 

depth — 87 88 
Psychologists 

plausibility —_ .96 

Discussion 


Measurement of Depth 


In devising this study it seemed desirable 
to consider and attempt to control a number 
of factors which might be expected to affect 
the ratings. Three such factors appeared par- 
ticularly important. 

1. A rather common way of estimating 
depth clinically seems to be based upon the 
patient’s subsequent reaction to an interpre- 
tation. If the patient strenuously rejects an 
interpretation, it must perforce be deep; an 
avidly accepted interpretation suggests shal- 
lowness. This view is implied, for example, 
by Fenichel (3) and is stated explicitly by 
Hilgard (8, p. 26). Nevertheless, this ap- 
proach has certain limitations. Foremost is 
the consideration that, for practical purposes, 
a therapist would be unable to decide in ad- 
vance whether he should withhold a particu- 
lar interpretation because it might be too 
“deep” at that time. Furthermore, since it 
appears that some therapists are highly astute 
in selecting propitious moments to offer in- 
terpretations, it would seem that depth can 
be estimated prior to, and independent of, 
knowledge of the patient’s postinterpretive 
responses. 

2. An adequate scale of depth cannot, how- 
ever, be independent of the patient to whom 
the interpretations are offered. The very same 
interpretation may be “deep” for one patient 
while quite “shallow” for another. Certainly, 
a sexual interpretation to a psychiatric resi- 
dent analysand may have different depth than 
the identical interpretation given to an illit- 


erate farmhand. This point is frequently 
neglected in loose discussions on the meaning 
and measurement of depth. For example, 
Collier’s scale (2) is constructed purely on 
the basis of the type of therapist behavior, 
quite independent of any specific patient. 
Yet his scale is derived from a concept of 
“degree of uncovering” (i.e., the more the 
interpretation “uncovers,” the greater is its 
depth) which clearly implies the dependence 
of depth upon a specific personality struc- 
ture: how much depth does an “uncovering” 
interpretation have when a patient, prior to 
the interpretation, is already aware of the 
material which the therapist is “uncovering”? 

3. Since depth of interpretation is often 
conceived in relation to the degree of “aware- 
ness” on the patient’s part, one might expect 
that depth ratings will vary as a function of 
the time in therapy when an interpretation 
occurred. As therapy progresses, it is gener- 
ally claimed that patients tend to acquire in- 
creased awareness. Hence, an interpretation 
given late in therapy might not be as deep 
as if it were given very early. It is also pos- 
sible, however, that a late interpretation be 
no deeper than the same interpretation given 
earlier (ie., if—relative to the content of 
the interpretation—the patient’s degree of 
“awareness” remains unchanged). The mere 
knowledge that an interpretation occurred in 
the two-hundred-fiftieth hour rather than the 
first hour is conceptually irrelevant to how 
deep it is. There is nothing inherent in the 
variable of time per se as a determinant of 
depth. The substantive point concerns knowl- 
edge of the patient’s personuality at the two- 
hundred-fiftieth and first hours. 

Under these three conditions, the results of 
the present study show that interpretive lev- 
els can be estimated with a high degree of 
reliability by using either “depth” or “plau- 
sibility” ratings. While mean plausibility rat- 
ings may perhaps be slightly more variable 
(SD of .40 and .33 compared with .16 and 
.26), they have the distinct advantage of be- 
ing available for use with comparatively in- 
experienced judges. For a series of interpreta- 
tions, averaging the ratings of a small number 
of therapeutically naive graduate students will 
give a fairly accurate estimate of the mean 
depth rating obtained from experienced psy- 
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chologists (r= .88) or even more experi- 
enced psychiatrists (r = .86). Thus, in the 
tedious and time-consuming analysis of thera- 
peutic protocols, the need for professional 
raters might be eliminated, substituting in- 
stead reasonably bright “technicians.” It 
should be noted, however, that less experi- 
enced judges tend to shift their ratings to- 
ward the deeper (more implausible) end of 
the scale, a finding originally observed by 
Harway et al. (7). In certain circumstances 
this additional variable would have to be 
taken into account. 

Of course, in one sense, the correlations ob- 
tained in this study are misleadingly high, in- 
asmuch as items sampling a broad range of 
depth were used. In actual practice, raters 
would more often be confronted with a trun- 
cated sample in any one case (limited by the 
theoretical scope and technique of the thera- 
pist), and consequently lower correlations 
would be expected. On the other hand, the 
relatively narrow distributions of interjudge 
correlations are indicative of the satisfactory 
reliabilities to be obtained when specific con- 
trols are placed on the measurements. 


Meaning of Depth 


In the present study, none of the judges 
was given any definition of the term “depth” 
when making his ratings. Nevertheless, the 
mean rating of the psychologists correlates 
.96 with the mean rating of the psychiatrists. 
There seems little doubt, then, that experi- 
enced therapists of various orientations are 
employing some common cues in determin- 
ing depth. This does not mean, however, that 
these therapists share a common, verbalized 
meaning for the concept. Many of the judges 
spontaneously offered comments on their rat- 
ings. The most significant aspect of these 
comments concerned the variability of the 
criteria which the raters believed they were 
utilizing. Among the 20 judges who made 
depth ratings, a variety of definitions was 
offered: 


I tend to equate depth with the impact the state- 
ment might be assumed to have on the patient, and 
with the lack of possibility of verification of the 
statement by observation of the patient’s behavior. 

I used such criteria as the directness of the vo- 
cabulary in the interpretation as related to deep 
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unconscious drives; those that were related to in- 
stinctual urges were deeper, those related to ego 
defenses more shallow. 

. ratings certainly involved at least a couple of 
dimensions besides depth—psychoanalyticness, range 
of generalization, centralization, centrality of prob- 
lem involved, etc. 

. . . depth is in itself an ambiguous concept, since 
it can be defined from several points of view, ez., 
developmental-genetic, resistance of subject to ac- 
cepting it, etc. 

I found myself giving ratings toward the “deep” 
end when: (1) interpretation was of material de- 
fended against, not of defense itself; (2) when in- 
terpretation was historical; (3) when interpretation 
tied past and present; (4) when interpretation was 
of symbolic material rather than of overt behavior 
(e.g., dreams, phantasies). 

I sense that I tend to regard the more symbolic 
comments as “deeper.” What deeper means is a little 
obscure. “Nearer manifest content” and “assumed 
latent content” would seem to be approximates of 
“shallow” and “deep.” 


From the preceding list, it would appear 
that—on a verbal level at least—different 
therapists would have difficulty in agreeing 
among themselves as to the meaning of the 
term “depth.” Some therapists, in fact, even 
felt that they lacked consistency in their rat- 
ings (e.g., “I’m certain if I were to do it 
a second time my reliability would be very 
low”). Yet the generally high intercorrela- 
tions among the depth raters implies that this 
is not really the case, and that there is a con- 
siderable amount of reliability in the cues 
used for rating the present sample. (One psy- 
chiatrist was kind enough to repeat his rat- 
ings after a two-day interval: test-retest r 
was .94.) 

The precise nature of these cues, of course, 
remains unknown. The data do, however, sug- 
gest that inferences in regard to depth are 
possibly based upon cues similar to the ones 
used in drawing plausibility inferences. The 
cues used by an inexperienced rater in in- 
ferring the plausibility of a statement may 
or may not be the very same cues an experi- 
enced therapist uses in estimating depth. In 
either case, however, there is the interesting 
implication that at least some of the cues be- 
lieved to be important by professional thera- 
pists may be irrelevant: specifically, that class 
of cues associated with a therapist’s theo- 
retical and professional training. Within the 





254 Seymour Fisher 


limits of the present results, it seems that ad- 
vanced training in therapy and psychodynam- 
ics is not prerequisite for a knowledge of 
what is “deep” or “shallow”; individuals 
without such training are capable of arriving 
at essentially similar conclusions.* 


Depth and Plausibility 


In view of the strong relationship between 
depth and plausibility ratings, it is tempting 
to intimate that when a therapist refers to 
the depth of an interpretation, he is doing 
nothing more than making an estimate of 
how implausible his statement would appear 
to this particular patient at this particular 
time. Clearly, the data from this limited 
study do not provide sufficient support for 
such a contention. But even in the absence 
of adequate empirical verification, this over- 
simplified notion has a number of features 
which might be elaborated. 

1. If we equate depth with perceived plau- 
sibility, it becomes possible to understand 
why many therapists verbalize their definition 
of depth in terms of the patient’s response to 
an interpretation. It is much easier to infer 
how plausible a statement was (using the pa- 
tient’s subsequent ‘behavior as additional in- 
formation) than to estimate how plausible the 
statement will be. Without such information, 
however, it is still possible to make predic- 
tions of the patient’s subsequent reactions 
based upon knowledge of his previous be- 
havior. Of incidental interest is the fact that 
other investigators (5, 10) have reported that 
therapists are no better than laymen in mak- 
ing these predictions. 

2. While the concept of depth assumes 
truth-content of an interpretation, this limi- 
tation disappears when a dimension of “per- 
ceived truth” or plausibility is used. As Fest- 


*This is not meant to deny the possibility that 
the professional therapist is more sensitive to relevant 
cues as a result of his experience. In the present 
study all possible cues were neatly presented to the 
judges in the case history and Rorschach summary, 
thus minimizing the importance of the rater’s per- 
ceptive abilities. In a situation where the number of 
cues is virtually infinite (e.g., a live doctor-patient 
interaction or even the evaluation of a complete se- 
ries of protocols), the inexperienced individual might 
be seriously handicapped in deciding how plausible 
a statement is to a particular patient. 


inger (4) has observed, many opinions have 
no inherent truth value in respect to the 
physical world. This seems particularly appli- 
cable to the content of therapeutic interpre- 
tations: whether an interpretation is deemed 
“true” or not generally depends upon the 
training of the therapist, with an Adlerian 
disputing the “truth” of a Freudian inter- 
pretation, etc. (for some interpretations re- 
ferring to the effects of repressed hostility 
and sexual impulses, one suspects that the 
Zeitgeist also plays a role in determining their 
alleged truth value). Suppose a therapist says, 
“Your anxiety stems from the devil, an evil 
spirit who lives in your stomach and makes 
you quiver inside and outside every time you 
become angry.” Is this interpretation deep? 
Of a number of therapists who were asked 
this informally, many replied that the ques- 
tion had no meaning, that the “interpreta- 
tion” could be considered neither deep nor 
shallow because it is patently absurd (i.e., 
obviously “untrue”).5 Nevertheless, this may 
not be absurd to a particular kind of patient. 
For one who believes in the reality of devils 
residing in the viscera, the therapist’s state- 
ment may be perceived as being true (i.e., 
plausible), and conceivably could have a sig- 
nificant effect upon the therapeutic process. 
The concept of plausibility permits evaluation 
of any therapist interpretation, irrespective of 
the truth or falsity of its content. 

3. From the working hypothesis that depth 
and plausibility refer to the same underlying 
construct, it is possible to attempt to inte- 
grate recent experimental studies in social 
psychology with the study of the psycho- 
therapeutic process. Depth can be considered 


5 But if a therapist in 1956 were to tell a patient, 
in more simple language perhaps, that his anxiety 
was due to the practice of coitus interruptus (4, p. 
187), would this be considered as “true”? It is inter- 
esting to note that virtually all common definitions 
of an “interpretation” tacitly assume the empirical 
validity of the causal relationship expressed by the 
therapist. Whether one talks about “uncovering,” 
“unconscious” (i.e., patient being “unaware”), or 
the like, the frame of reference always seems to be 
one in which it is hoped that the patient will ulti- 
mately come to recognize the “truth” of what the 
therapist is implicitly or explicitly communicating. 
Perhaps it is sufficient for the patient to accept (in 
the sense of internalizing a new basic belief) what 
the therapist is suggesting. 
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as a form of “distance,” a concept that is re- 
ceiving some current attention and one which 
can be controlled and manipulated system- 
atically in a laboratory setting. By “distance” 
is meant the discrepancy between two per- 
sons’ opinions. That is, if P (a prestige 
source) expresses an opinion quite similar to 
that held by S (subject), the distance is 
small; if P expresses a judgment markedly 
different from S’s belief, the distance is large. 
It is self-evident that, other things being 
equal, plausibility is a necessary monotonic 
correlative of distance, larger distances being 
more implausible than smaller distances. Now 
an interpretation is a particular type of judg- 
ment by a therapist, one which can be con- 
ceptualized as lying along a dimension of dis- 
tance. A “deep” (implausible) interpretation 
is one with a large distance; a “shallow” 
(plausible) interpretation has little cistance. 
A therapist statement which has zero dis- 
tance is mot an interpretation (e.g., patient 
says “I hate my father”; therapist then says 
“You hate your father”). Thus a new and 
unified working definition of interpretation is 
suggested, one which includes in its definition 
the dimension of depth and at the same time 
eliminates the need for any assumptions re- 
garding truth value: 

An interpretation is an implicit or explicit 
judgment about the patient’s motivational 
and emotional behavior where the distance is 
greater than zero; the magnitude of the dis- 


tance is a measure of the interpretation’s 
depth. 


Implications 


If one is willing to entertain the possibility 
that psychotherapy is an instance of a two- 
person interaction series where one individual 
(therapist) attempts to modify the belief-sys- 
tems of another individual (patient) by “so- 
cial influence,” then some recent laboratory 
findings might have bearing upon the study 
of therapy. For example, in a two-person 
group it has been shown that distance (and 
consequently plausibility) is a_ significant 
variable in determining the degree to which 
S’s verbal judgment will conform to P’s: in 
general, the larger the distance the less S will 
conform, the function being negatively ac- 


celerated (9). From this generalization, it 
should follow that, other things being equal, 
deep interpretations should be rejected by 
the patient more frequently than shallow in- 
terpretations. It has also been found that 
when continued pressure is placed upon S 
over a series of interactions with a prestige 
figure, under certain conditions S will subse- 
quently show more conformity to judgments 
by P than ever would have been expected had 
P made these judgments early in the interac- 
tion series (6). In other words, judgments 
which would have been extremely implausible 
if given at the beginning of interaction be- 
tween P and S tend to become considerably 
more plausible after certain kinds of interac- 
tion have occurred over a period of time. 
These results may possibly offer a clue as to 
how some patients, following long-term psy- 
chotherapy, have acquired beliefs from their 
therapist (e.g., “castration fear is the basis 
of my difficulties”) which in all probability 
would have been rejected if the therapist had 
introduced these interpretations in the initial 
phases of therapy. 

As more research findings emerge concern- 
ing effects of distance in continuous interac- 
tions, it would not be too surprising to find 
that the role played by interpretations in the 
nebulous process of psychotherapy is consid- 
erably illuminated. 


Summary and Conclusions 


Four groups of judges, three of which dif- 
fered in therapeutic experience, were asked to 
rate a series of 60 psychotherapeutic interpre- 
tations. Two groups rated for “depth”; two 
groups rated for “plausibility.” High intra- 
group agreement was obtained on both depth 
and plausibility scales. When mean plausi- 
bility ratings were correlated with mean depth 
ratings, a sufficiently high relationship was 
found to suggest that the two scales may 
be measuring the same underlying construct. 
Plausibility ratings of therapeutically naive 
graduate students correlated .88 with depth 
ratings from experienced psychologists and 
.86 with psychiatrists’ depth ratings, imply- 
ing that advanced training in therapy and 
psychodynamics may not be prerequisite for 
a knowledge of what is “deep” or “shallow.” 
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It was also noted that less experienced raters 
tend to judge interpretations as more deep 
(and implausible) than experienced thera- 
pists. The possible advantages and implica- 
tions of conceptualizing depth in terms of 
plausibility are discussed. 


Received September 13, 1955. 
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Personality Patterns of Neurotic Adults 
in Psychotherapy 


Maurice Lorr and Eli A. Rubinstein ’ 


The Veterans Administration and Catholic University of America 


There is wide agreement that present-day 
methods for evaluating outcomes of psycho- 
therapy are unsatisfactory. There are no gen- 
erally accepted measures of morbidity or of 
change and few that have gained even a par- 
tial acceptance from research workers. What 
is needed is a more objective system of evalu- 
ation than is currently available and a set of 
measures descriptive of the principal dimen- 
sions of personality and mental illness. The 
study reported here is one part of a program 
designed to develop criteria of mental illness 
and to isolate some of its more important di- 
mensions. 


In a recent analysis by the authors (3) the cor- 
relations between 58 ratings of a group of 184 vet- 
erans in psychotherapy were found to be well ac- 
counted for by ten personality and symptom pat- 
terns. Seven of these patterns were found to bear 
varying degrees of resemblance to factors identified 
in earlier studies. However, it seemed highly desir- 
able to establish more firmly the existence of these 
ten parameters, of which several had not been identi- 
fied previously. Clearly if these factors could be iso- 
lated repeatedly in patient samples differing widely 
in composition and with varying combinations of 
reference variables, they would be acceptable at a 
much higher level of confidence. 


With this goal in mind the present study 
was designed to test whether ten personal- 
ity and symptom patterns recently identified 
could be confirmed and clarified in a second 
group of patients. The initial study was based 
on neurotic and psychotic patients who had 
been in treatment on the average for three 
months when rated. In this analysis the pa- 
tients included were restricted to neurotics 
who had been in treatment for three or four 


1From the Veterans Benefits Office, Washington, 
DS. 


sessions when rated. The rating schedule used 
in the original study was also modified. The 
descriptive scales used for rating patients 
in the initial analysis were converted into 
graphic form. Scale titles were removed and 
scales were presented in a random order to 
the raters. The question was whether the same 
ten parameters would be isolated again de- 
spite these changes in composition of the 
sample and alteration in the format of the 
rating schedule. 

The study was structured around ten hy- 
pothesized constructs. The underlying pat- 
terns or syndromes included for experimental 
test were the following: (a) lack of emo- 
tional restraint vs. inhibition of emotionality, 
(6) degree of distortion of reality in thinking 
and perception, (c) anxious tension vs. re- 
laxed comfort, (d) sense of personal adequacy 
vs. its lack, (e) dependent immaturity vs. in- 
dependent immaturity, (f) conscientiousness 
vs. its lack, (g) obsessive-phobic reaction, 
(4) gastrointestinal reaction, (i) cardio- 
respiratory reaction, (j) conflict between sex 
impulses and moral standards. 

Included also were five new scales descrip- 
tive of alcoholic addiction, defensiveness, im- 
pulsiveness vs. deliberateness, avoidance of 
attention vs. self dramatization, and accept- 
ance vs. resistance to regulation by others. 
These were constructed to further clarify 
factors already identified. 


Procedure 


The sample consisted of 215 nonpsychotic 
male World War II and Korean veterans 
with service-incurred psychiatric disabilities. 
Eleven Veterans Administration mental hy- 
giene clinics ranging, geographically, from 
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Los Angeles to Newark collaborated in the 
study.’ All patients were newly accepted for 
treatment and 90 per cent had never before 
received any formal psychotherapy. The me- 
dian age was 32 and the median grade com- 
pleted was 11. 


Each patient was rated once by his own therapist 
on each of the 61 scales of the MSRPP (4) after 
the third or fourth interview. The raters were pre- 
dominantly staff therapists (psychiatrists, clinical 
psychologists, and social workers), but a few were 
advanced psychology trainees and psychiatric resi- 
dents. Raters in each clinic participated in several 
training sessions and were provided ample oppor- 
tunity to discuss and compare their practice ratings 
of patients and colleagues. 

The rating schedule (MSRPP) consisted of 61 
unlabeled, randomly ordered, four- and six-point 
graphic rating scales. A detailed rating guide was 
attached as a preface to each rating form. Inde- 
pendent studies of interrater agreement indicated 
that the average product-moment correlations be- 
tween raters following two or three training sessions 
was about .75. 

Fifty of the 61 scales were selected for the factor 
analysis. Of the 11 scales omitted most were dropped 
because of lack of relevance to the study, a few be- 
cause they were too often left unrated, and the re- 
mainder were eliminated if the symptom (e.g., skin 
disturbances) was rarely noted. Product-moment 
correlations between the 50 scales were obtained by 
IBM procedures. The variable correlations were then 
arranged into ten clusters on the basis of the fac- 
tors hypothesized. The ten clusters and their defining 
variables were as follows: Cluster A, variables 1 to 
6; Cluster B, variables 7 to 13; Cluster C, variables 
19 to 24; Cluster D, variables 25 to 29; Cluster E, 
variables 30 to 34; Cluster F, variables 35 to 39; 
Cluster G, variables 14 to 18; Cluster H, variables 
40 to 44; Cluster J, variables 44 to 46; Cluster J, 
variables 47 to 50. See Table 1. 

From the 50 X 50 clustered correlation matrix, 10 
factors were extracted simultaneously by the multi- 
ple group method with centroid estimates of the 
communalities in the diagonal cells. No evidence of 
additional factors was observable in the residuals. 
The obtained oblique matrix was orthogonalized and 
then rotated blind by the single-plane method to 
oblique simple structure. The structure was suffi- 
ciently well defined by the initial cluster arrange- 
ment that on the average only two rotations were 


2 The continued cooperation of the respective clinic 
chiefs, chief clinical psychologists, and station in- 
vestigators from the Veterans Administration mental 
hygiene clinics in Baltimore, Brooklyn, Chicago, 
Denver, Detroit, Los Angeles, Newark, Pittsburgh, 
St. Paul, San Francisco, and Washington, D. C., is 
most gratefully acknowledged. The authors also ac- 
knowledge the invaluable assistance of Elizabeth 
Turk with the statistical computations. 


required to locate a plane. The oblique rotated fac- 
tor matrix V is given in Table 2.8 


The First-Order Factors 


Only those variables whose correlations 
with a factor are .30 or higher are used in 
the interpretations of the first-order factors. 

Factor A defines a clear-cut bipolar pa- 
rameter of emotional responsiveness identical 
with the A previously reported. A patient 
scoring high on A is relatively unrestrained 
in his behavior, moody, overreactive emotion- 
ally, inclined to dramatize himself, impulsive 
and overt in his expressions of hostility. In 
addition he is likely to be cheerful in mood 
and enthusiastic in his interests. A low-scor- 
ing patient is relatively inhibited and re- 
strained in his emotional behavior, unable to 
express anger, hate, or other feelings openly, 
and inclined to avoid drawing attention to 
himself. Clinicians are likely to recognize the 
extremes of this pattern. It is common prac- 
tice to categorize some patients as overcon- 
trolled and unresponsive emotionally and 
others as lacking in restraint and control. 

Factor B appears to be measuring a pa- 
rameter descriptive of a hostile, suspicious 
rebelliousness against adults in authority. At 
the maladjusted end the patient is hostile and 
suspicious of others, resentful of authority, 
defensive in attitude, inclined to blame others 
for his difficulties, unlikely to give way in 
conflicts, and is resistive to regulation by 
others. In addition he is markedly lacking in 
concern for the welfare of others. Factor B 
resembles the B of the earlier study with re- 
spect to maladjustment in interpersonal rela- 
tions but it lacks the gross distortion of think- 
ing and perception of the latter. In the initial 
study, reality distortion, No. 14, and ideas of 
reference, No. 18 correlated with B. In the 


®The following additional tables have been de- 
posited with the American Documentation Institute: 
Table A, Orthogonal Factor Matrix F of Correla- 
tions of 50 Scales with Each Factor; Table B, Trans- 
formation Matrix; Table C, Correlations Between 
Primary Factors; Table D, The Second Order Fac- 
tor Matrix F. Order Document No. 4850 from ADI 
Auxiliary Publications Project, Photoduplication 
Service, Library of Congress, Washington 25, D. C., 
remitting in advance $1.25 for microfilm or $1.25 
for photocopies. Make checks payable to Chief, Pho- 
toduplication Service, Library of Congress. 
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Table 1 


Brief Descriptions of Both Ends of the 50 Scale Variables Factored 





unrestrained feeling. wanna 
expresses hostility overtly....... 
emotional over-response. . . 
frequent mood changes. . . 

highly impulsive......... 
dramatizes self... . : nae 
blames self for difficulties. . 

bears little hostility . 

perceives world as friendly. 
acceptant of authority...... 
rarely defensive...... 

open and trusting... .. 

accepting of regulation. . . 

no distortions of reality 
no insistent thoughts 
attentive to outside matters 
no morbid fears 

no unjustified suspicion 
relaxed. ee 
rare sleep diffic ulty » tee 

rarely irritable 

elated. 

untroubled. . . 

alcohol not a problem... 

rarely gives way to others... .. 
unaffected by approval need. . . 
seldom feels inadequate... . 
strong belief in self...... 
untroubled by guilt. . . 

neglects responsibility . 
dependent on others. .... 
weak interests........ 

low achievement motivation . 
low standards 

weak impulse control 
careless with tasks. . 
little concern for orderliness. ........ 
no concern for anyone........ 

never considers tomorrow 
no headache complaints 
energetic. .... 

no gastric symptoms. 

no intestinal or bowel symptoms. 
no body concern.......... 
no respiratory symptoms...... 
no cardiovascular symptoms... .. 
fe 

no compulsive acts........... 

no concern over homosexual tendencies. 
no concern over masturbation. .... 





present study these two variables have their 
principal projections on G. These differences 
are possibly due to the absence of patients 
with obvious psychotic symptomatology in 
the present sample. 

The third Factor C appears to be identical 


restrained feeling 

conceals hostility 

emotional under-response 

no mood changes 

highly deliberate 

shrinks from attention 

blames others for difficulties 
bears much hostility 

perceives world as hostile 
resentful of authority 

usually defensive 

usually suspicious 

resistive to regulation 

marked distortions of reality 
frequent recurring thoughts 

self preoc cupied 

disrupted by morbid fears 

has ideas of reference 

tense 

frequent sleep difficulty 
frequently irritable 

depressed 

disrupted by anxiety 

alcohol disrupting behavior 
usually gives way to others 

much affected by approval need 
usually feels inadequate 

weak belief in self 

much troubled by guilt 
over-accepting of responsibility 
wholly self reliant 

strong, active interests 

high achievement motivation 
high standards 

strong impulse control 
painstaking with tasks 

strong concern for orderliness 
strong concern for others 

much consideration of the future 
disturbing headaches 

tired and worn out 

severe gastric symptoms 

severe intestinal or bowel symptoms 
extreme body concern 

severe respiratory symptoms 
severe cardiovascular symptoms 
disturbed by sex conflict 
disturbed by compulsive acts 
concern over homosexual tendencies 
guilt over masturbation 


with the C of the earlier study except for the 
presence of a depressive element. A manifest 
tension, depressive in character, 
underlie the factor. The maladjusted end of 
C is defined by signs of manifest tension, an 
apprehensiveness not called for by circum- 


appears to 
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Table 2 
Rotated Oblique Factor Matrix V 
(Decimal points are omitted.) 


Factors 
( D I I G H 
O04 07 06 02 02 13 
-08 34 19 —25 00 06 
02 —(4 —11 —(07 03 —O1 
—02 08 —O05 -05 03 —()5 
—20 16 —04 45 06 03 
14 —13 —18 25 10 —O1 
—21 —12 —17 25 03 02 
07 —03 —05 19 03 —05 
15 33 02 00 01 13 
—(2 O08 16 —()4 00 08 
—O1 13 09 16 18 06 
20 01 06 -01 18 16 
04 —O08 13 —17 —()2 03 
—O1 03 —O8 01 50 04 
—02 00 08 -03 56 —03 
14 00 —15 06 34 01 
—i1 —03 03 15 40, 03 
05 —05 —09 07 41 04 
43 17 —02 —06 00 02 
31 —04 —1i1 05 06 31 
35 —14 —10 06 — 03 04 
33 —07 —23 27 19 02 
42 18 05 07 12 —02 
24 —08 —02 —25 09 —04 
—06 42 —25 06 —12 —05 
—15 35 16 22 —i1 04 
—01 65 00 —02 01 01 
03 51 —39 05 04 —21 
05 21 03 16 —(07 —02 
—08 —12 33 40 —O05 05 
11 —28 61 —13 —05 14 
—18 —02 63 —04 —08 12 
—05 —02 72 06 01 05 
03 21 70 —03 01 —05 
—13 13 25 16 —06 03 
02 —20 26 48 02 —03 
01 01 —02 64 —02 05 
—03 02 27 13 —05 00 
23 —03 53 11 —08 —05 
34 —03 —09 —08 —17 17 
01 19 —39 11 15 29 
—O0-4 03 —03 05 —01 67 
06 —02 15 —08 06 64 
—03 03 —19 29 00 55 
—15 10 06 —03 02 —09 
00 —05 —O04 04 —02 11 
00 —08 —09 02 20 —03 
—03 —12 12 —04 23 03 
—O4 —04 -02 08 08 —03 
00 03 07 —03 — (02 O4 


00 
—02 
07 
—06 


—02 


—19 


—03 


—06 


29 


02 
—03 
—08 
—02 
—01 


10 


—13 
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—06 
48 


38 
57 
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stances, irritability, indications of a depres- 
sive mood, and difficulties with sleep. In clini- 
cal practice Factor C would probably be re- 
garded as an anxiety reaction. 

The fourth Factor D is descriptive of the 
sense of personal adequacy in the patient. At 
the maladjusted end of the scale the patient 
has a marked tendency to feel inadequate, is 
lacking in a belief in himself and his powers, 
is inclined to give way and defer to others in 
conflicts, and is much affected in his behav- 
ior by a need to secure and maintain their 
approval. Except for the absence of signifi- 
cant projection from guilt feelings, No. 36, 
D is the same as the Factor D isolated in the 
earlier study. 

E is a bipolar factor descriptive of drive 
toward long-term goals. At one end of the 
continuum is the patient whose motivation 
for long-term goals such as achievement, ac- 
ceptance, recognition, or power is low, and 
whose standards of adequacy of performance, 
in areas he considers important, are also low. 
His interests are weak, and his energy is low 
and he is lacking in belief in himself. He 
seems never to consider tomorrow, and is in- 
clined to depend on others for the initiation 
and direction of his activities. The opposite 
end of the continuum describes the individual 
whose activities are much determined by long- 
term goals. His interests are strong, his en- 
ergy is high, and he is inclined to be self- 
confident and self-reliant. In the initial study 
a corresponding Factor E of independent ma- 
turity was identified. This factor contained 
elements of what might be described as re- 
sponsible maturity and goal-directed control 
or persistence. In the present study the goal- 
direction and independence are more promi- 
nent than tendencies toward acceptance of 
adult responsibility. 

The bipolar Factor F is defined at one end 
by a strong concern for system, orderliness, 
or routine in activities and surroundings, and 
by the careful and painstaking performance 
of tasks undertaken. An inclination to be de- 
liberate and reflective prior to action and a 
tendency to accept more than one’s share of 
responsibility and obligation is also charac- 
teristic of F. The opposite end of F is charac- 
terized by a careless indifference to the per- 
formance of tasks, a lack of concern about 


adherence to accustomed patterns, impulsive- 
ness and a tendency to avoid or neglect re- 
sponsibilities. Patients scoring high on the 
conscientious pole of F correspond closely to 
the compulsive personality delineated in the 
American Psychiatric Association’s manual of 
mental disorders (1). Such patients are said 
to be characterized by chronic, excessive, or 
obsessive concern with adherence to stand- 
ards of conscience or conformity. A character 
trait of conscientiousness vs. careless indif- 
ference appears to be represented in F. Es- 
sentially the same factor was isolated in the 
initial study. 

In the initial study it was stated that Fac- 
tor G appears to be defining an obsessive- 
compulsive-phobic reaction. However, it was 
felt that this inference required cross check- 
ing. In the present study the patient scoring 
high on G tends to be preoccupied with in- 
sistent, recurring, useless thoughts, tends to 
distort reality in his perception and thinking, 
is disturbed by morbid fears, and is inclined 
to believe on the basis of slight evidence that 
people talk about or refer to him. As might 
be expected he is also described as much pre- 
occupied with himself. The hypothesis of an 
obsessive-compulsive reaction is plausible but 
the ideas of reference seem out of place. An 
alternative hypothesis is that G represents 
one aspect of an incipient or latent schizo- 
phrenic tendency. Obvious psychotics exhibit- 
ing delusions, hallucinations, and grosser forms 
of reality distortion and falsification were not 
included in the present study. However, no 
rigorous effort was made to exclude patients 
who might show indications of schizophrenic 
tendencies, since the data were collected for 
another purpose. The prominence of distor- 
tions of thinking and perception and the ex- 
tent of disruption of normal activities sug- 
gest that this parameter represents a more 
profound disturbance than an 
phobic reaction. 

The scales defining Factor H describe a 
gastrointestinal reaction. The high-scoring 
patient is likely to present symptoms of 
stomach, intestinal, and bowel disorders, and 
accordingly is likely to be much preoccupied 
with his health and the functioning of his 
bodily organs. Except for the absence of 
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headaches, Factor H is identical with the 
corresponding factor previously identified. 

Factor J is defined only by the presence of 
symptoms of respiratory disturbance (asthma, 
hay fever) and of cardiovascular symptoms 
(tachycardia, hypertension). As in the ear- 
lier study J represents a cardiorespiratory re- 
action. It is possible that if the specific syn- 
dromes included under these two scales (such 
as asthma and hypertension) were individu- 
ally scaled, Factor J would break up into sev- 
eral parameters. 

The last Factor J is characterized by con- 
cern or guilt over masturbation, conflict be- 
tween sexual impulses and beliefs, concern 
over homosexual tendencies, and impulses to 
perform irrational unnecessary acts which 
are resisted only with considerable discom- 
fort. It seems clear that J is the same as the 
corresponding factor earlier identified. It rep- 
resents a conflict within the patient between 
his sexual impulses and his moral and social 
standards. The conflict finds expression peri- 
odically in overt repetitive acts. 


The Second-Order Factors 


The correlations between the ten primary 
vectors were computed, rearranged into two 
clusters, and factored by the multiple group 
procedure to yield two orthogonal factors not 
requiring rotation (2). To facilitate inter- 
pretation of the second-order factors, the cor- 
relations of the fifty scale variables with 
these two factors were computed by means 
of a suitable matrix transformation (5). 

Table 3 presents the scale correlations with 
Factors AA and BB. Factor AA is defined by 
an anxious tension, a lowered sense of per- 
sonal adequacy, much preoccupation with in- 
sistent thoughts and morbid fears, and to a 
lesser extent by conflict over sexual prob- 
lems. The major element seems to comprise 
an agitated, intropunitive self-preoccupation. 
The pattern suggests a type of personality 
disorganization and conflict that occurs when 
ego defenses crumble. 

Factor BB is independent of AA and rep- 
resents almost as much of the variance as 
the latter. BB is most strongly defined by 
primary factors B, EZ, and F. Hostility, re- 
sentment of authority, suspicion and defen- 


Table 3 


Correlations of the 50 Scales on the 
Second-Order Factors 


(decimal points are omitted) 











Factors Factors 
Scale 9 ——————_-— Scale 
No AA BB No. AA BB 
1 —04 44 26 34 + —06 
2 14 —09 27 55 02 
3 —14 23 28 46 —17 
4 —11 12 29 38 10 
5 15 29 30 —06 32 
6 10 30 31 —31 30 
7 — 36 33 32 —13 06 
8 —Q5 60 33 03 35 
9 22 37 34 18 36 
10 —(7 49 35 03 7 
11 —15 53 36 00 51 
12 03 55 37 20 45 
13 —21 22 38 13 —15 
14 47 02 39 11 39 
15 52 02 40 —02 —il 
16 28 04 41 24 —06 
17 49 04 42 00 —14 
18 29 18 43 06 00 
19 37 —07 44 14 05 
20 28 —10 45 13 —19 
21 09 09 46 — O04 —09 
22 19 41 47 37 09 
23 50 07 48 28 02 
24 14 —21 49 24 18 
25 35 —24 50 36 —07 





siveness are joined with a strong drive for 
long-term goals and a conforming conscienti- 
ousness. Compared to AA, this pattern repre- 
sents a tight defensive system with extropuni- 
tive elements. 


Discussion 


In terms of our initial aim to find a set of 
measures more descriptive of the principal 
dimensions of personality and mental illness 
than currently available, there is one perti- 
nent question that might be asked. To what 
extent do these factors fit into currently ac- 
cepted personality theory? It would seem 
easy, perhaps deceptively too easy, to recon- 
cile these factors, in terms of both origin and 
dynamics, with a number of different theo- 
retical frameworks. The difficulty is not really 
in linking these factors, as parameters of per- 
sonality functioning, with one or another of 
the current theories but rather in choosing 
from among an embarrassing number of 
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equally plausible constructions. This is per- 
haps the same problem the clinician has when 
confronted with his test data and before he 
formulates his explanation of the dynamics of 
the patient. Alternate hypotheses are readily 
available and there is little doubt, especially 
if he talks about “basic” dynamics, his ulti- 
mate choice depends as much, if not more, 
on his own theoretical persuasion as on the 
goodness of fit of data to hypothesis. How- 
ever, in the present study, primary interest 
was focused on establishing the factors more 
firmly on an independent second sample. 
Rather than speculate with little supporting 
data it seemed preferable to leave for an- 
other study the experimental testing of one 
or more hypotheses that might be proposed 
to explain the parameters identified. 

The problem of theoretical formulation is 
further complicated by the fact that the fac- 
tors cannot be considered in isolation. It 
seems highly likely that they operate con- 
figurally in the individual. Thus, a high level 
of motivation for long-term goals (Factor £) 
is commonly regarded as a positive trait. But 
it is not so clearly positive if it is associated 
with restraint of emotional expression (Fac- 
tor A), a hostile rebelliousness (Factor B), 
and an excessive conscientiousness (Factor 
F). Similar complex interactions can be noted 
for most of the other factors. The organiza- 
tion of these parameters in the various per- 
sonality types must be considered, although 
each factor is relatively independent in the 
population as a whole. 

Another problem that merits discussion con- 
cerns the relation of the factors identified in 
this population to those likely to be obtained 
for a “normal” group of adults. It is our pre- 
sumption that most neurotic behavior is on 
the same continuum but is an extreme in 
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either direction from “normal’’ behavior. For 
example, an excess of restraint in expression 
of feeling and emotion and the lack of ap- 
propriate control of emotional expression rep- 
resent behavior deviations. Both are extremes 
of a trait of emotional responsiveness. If this 
assumption is valid, and there is much to 
support it, then the ten personality parame- 
ters represented by our factors are also de- 
scriptive of normal individuals. 


Summary 


This study sought to confirm and clarify 
ten personality and symptom patterns previ- 
ously isolated. A sample of 215 male World 
War II and Korean War veterans were rated 
by their therapists on 61 scales descriptive of 
patient behavior, symptoms, and certain in- 
ferred needs and attitudes. Seven of the fac- 
tors hypothesized were fully confirmed while 
three factors were regarded as related to the 
original factors only in part. An analysis of 
the correlations between the primary factors 
resulted in the identification of two inter- 
pretable second-order factors. 
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While problems of professional ethics and 
of technique in psychotherapy have received 
much attention from clinical psychologists, 
almost no consideration has been given to 
value problems involved in carrying on treat- 
ment. The main points to be developed in this 
paper are that value judgments or positions 
are inescapable in psychotherapeutic prac- 
tice; that such judgments to date have been 
largely implicit and unrecognized; that they 
are based upon a relatively narrow conception 
of adjusted or healthy behavior; and that a 
major goal of psychotherapy should involve 
social change. 

Perhaps the first value decision which the 
psychologist faces is whether to do therapy or 
not. It has been argued by some that it is 
improper to attempt to change other people’s 
behavior—even though they request assist- 
ance—that it is unethical to interfere with 
the “natural” course of events. Stated this 
boldly, the problem hardly seems to have 
created conflicts for psychotherapists. The 
obvious solution to this problem, as stated, 
lies in recognizing that the position of non- 
interference in other persons’ behavior is no 
less of a value judgment and requires no less 
justification than the decision to interfere, 
i.e., to do therapy. Furthermore, the value 
position of interference is clearly congruent 
with broader values in our culture, such as 
those involved in the process of education. 

Some psychologists, incidentally, have not 
been comfortable with the foregoing position. 
They have, instead, attempted to avoid the 
problem by adopting a therapeutic role which 
minimizes the therapist’s interference. Such 
a position does not seem to escape responsi- 


1 An earlier version of this paper was read at the 
1955 meeting of the APA in San Francisco. 


bility for intervention. It simply means that 
the therapist’s role may vary widely from a 
very active to a very passive one. Nonethe- 
less, at every point on this dimension, the 
same basic value position must have been 
taken, namely, that some change in the pa- 
tient is to be facilitated, i.e., that therapeutic 
interference of some sort is necessary or de- 
sirable. Once this is done, the question of 
how best to achieve such behavior change— 
whether by active or passive therapy—be- 
comes an empirical one. 

A more difficult problem involving social 
values is the question of what direction ther- 
apy should take. In medicine the norm of 
health is a biological one and relatively clear- 
cut. The physician is rarely troubled about 
the direction treatment for physical illness 
should take. In psychology, however, the 
norm of health or adjustment is to a large 
extent relative and variable. It is generally 
undefined or else defined only implicitly 
within various theoretical approaches to per- 
sonality. The therapist, in every case how- 
ever, makes a choice of therapeutic direction, 
either explicitly or, most often, implicitly by 
working within the framework of a particular 
theory. It is the nature of these choices which 
constitutes a serious value problem in psycho- 
therapy. 

Discussions of psychotherapy in nearly all 
recent texts refer to two general directions or 
goals of treatment. The first of these is intra- 
personal and usually refers to the reduction 
or elimination of conflict or inner tension, 
or some such subjectively-oriented concept. 
Little or no concern is shown for the social 
origins, implications, or context of behavior. 
Perhaps the best example of this view is ex- 
pressed by Carl Rogers (9). He insists that 
the therapist must be willing to give the 
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client full freedom as to outcomes, including 
the choice of antisocial or immoral goals, re- 
gression, and even death. While Rogers as- 
sumes that such complete freedom will in- 
evitably lead in a positive direction, it is ap- 
parent that such an extremely intra-individual 
position as to the direction of therapy over- 
looks the fact that adjustment is in part so- 
cially defined and certain choices are simply 
not open to the client. 

This position is usually tempered by refer- 
ence to the second major goal of treatment— 
what Cameron (2) has called the agreement 
of behavior with cultural expectations; what 
others have referred to as social conformity. 
There are many difficulties in accepting this 
goal, too. Society is not an undifferentiated 
structure, neither is it static. The question 
must be asked, to what groups of society is 
the individual to conform, to which aspects 
of a changing society? In an earlier publica- 
tion, Rogers recognized these problems. He 
wrote: “There is no doubt that in some in- 
stances social work and even clinical work 
has been used as a means of blocking social 
progress. Unless the worker with individuals 
is alive to the significant movements and 
trends in our present-day culture, he may 
easily be seduced into upholding some fixed 
notion of the socio-economic situation” (8, 
p. 355). 

Most therapists are middle-class individu- 
als. The studies of Robert Havighurst (6), 
and August Hollingshead (7) offer clear-cut 
evidence of the influence of the values and 
social goals of middle-class teachers (their 
emphasis on neatness, manners, and polite- 
ness) on their teaching practices and rela- 
tions with students. It is not unlikely that 
the same is true in psychotherapy. Some in- 
teresting related evidence on this point has 
recently been published. At one outpatient 
clinic a significant relationship was found be- 
tween the patient’s social class and the de- 
cision of the intake conference. In a study by 
Schaffer and Myers (10) it was found that 
whereas 64.7 per cent of patients from the 
professional and executive class were assigned 
for therapy to senior psychiatric staff and 
residents, only 2.4 per cent of the lowest class 
and only 33.4 per cent of the next lowest class 
received such high-level therapists. A patient 


of bottom-class status has between five and 
seven times the likelihood of being not rec- 
ommended for therapy as does the patient 
from one of the top classes. The authors of 
this study mention, among other reasons for 
their findings, implicit evaluations by thera- 
pists as to what sort of patient is really more 
worth while. Such evaluations need to be 
made explicit in order to determine whether 
they reflect mere ethnocentric bias or some 
broad social orientation. 

Furst (5) has stressed the implicit values 
regarding social conformity contained in psy- 
choanalytic theory. He cites, for example, 
the interpretation frequently given neurotic 
women that their difficulties with men stem 
from “penis envy,” and suggests that such 
interpretations tend to exonerate society for 
the historically inferior position of women. 

Consideration of the social-value implica- 
tions and shortcomings of the two generally 
accepted goals suggests the need to include 
a third major goal in psychotherapy—what, 
for want of a better term, may be called so- 
cial contributiveness. This goal, too, repre- 
sents a particular value orientation and is not 
without difficulties, e.g., how to get agree- 
ment on what constitutes socially contribu- 
tive behavior. Nevertheless, its components 
tend to balance the subjective and static ori- 
entations of the other two concerning adjust- 
ment. 

Two aspects of social contributiveness may 
be mentioned. One involves the notion ex- 
pressed by Cantril (3) and L. K. Frank (4) 
that much personal conflict is due to objec- 
tive conflicts, contradictions, and shortcom- 
ings in the socioeconomic organization in 
which the patient lives. Therapy then ought 
to influence the patient to work toward the 
elimination of objective social conflicts or to- 
ward cultural reorganization in order to es- 
cape his own personal conflicts. The other 
aspect of social contributiveness stresses the 
possible importance, for stability of person- 
ality, of a broad social philosophy. This es- 
sentially is Adler’s (1) emphasis on the im- 
portance in therapy of developing what he 
called social feeling or cooperative work for 
the benefit of others. More recently Weiss- 
kopf-Joelson (11) has made a similar point. 
She feels a consistent Weltanschauung to be 
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important for adjustment and notes its ab- 
sence in modern Western culture. The con- 
temporary stress on flexibility and tolerance 
of ambiguity as symptoms of good adjust- 
ment suggests to her “a subterfuge of an un- 
stable culture which has little to give but 
ambiguity” (11, p. 604). 

To summarize, it is suggested that psycho- 
therapists direct part of their efforts toward 
increasing the social contributiveness of pa- 
tients in two ways:—(a) by helping them 
develop broader social feelings, and (5) by 
encouraging active participation in the proc- 
ess of constructively changing their social en- 
vironments. Teaching the nature and impor- 
tance of social contributiveness makes therapy 
coterminous with education as a whole. 

The burden of this position, of course, rests 
upon the determination of what is socially 
contributive behavior, a difficult but not in- 
superable problem. In many cases, e.g., where 
social discrimination or chauvinism is in- 
volved in the disturbance, the answer is ob- 
viously contained in explicit values which are 
part of the general democratic tradition. For 
the future, however, this viewpoint would 
suggest two things. One is that consideration 
of the whole problem of social values in- 
volved in psychotherapy be part of the 
training which clinical psychologists (and, 
of course, psychiatrists) receive. The second 
is that some multidisciplinary group, e.g., 


SPSSI, might well undertake a broad study 
of the issue of socially contributive behavior. 
The study of values need not be impervious 
to empirical research. For one thing, if we 
can get agreement on distant or more general 
values or goals, then the relationship of more 
immediate or specific values to those distant 
ones should be empirically determinable. 
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In an earlier study utilizing 20 subjects 
(5) we reported that interaction patterns for 
any given individual during a partially stand- 
ardized psychiatric interview had markedly 
high stability coefficients across two inter- 
viewers. In addition to demonstrating this 
stability of patient interaction patterns (ef- 
fects) when the stimulus (interviewer’s be- 
havior) was cortrolled along certain dimen- 
sions, it was shown that these same highly 
stable interviewee patterns could be reliably 
modified by planned changes in the inter- 
viewer’s behavior from one subperiod of the 
interview to another. 


The method of measurement used was the interac- 
tion chronograph, an instrument devised by Chapple 
(1, 2) for recording certain temporal aspects of the 
interactions between an interviewee and interviewer. 
A more complete description of the variables studied 
and the method of observation used will be found 
in our previous report (5), while a detailed review 
of the literature, a history of the development of the 
interaction chronograph, and the theory underlying 
it will be found elsewhere (4). 

Briefly, the standardized interview involves cer- 
tain “rules” for the interviewer to follow during 
each of five predefined subperiods of the interview; 
Periods 1, 3, and 5, consisting of free give-and-take 
interviewing, and Periods 2 (silence) and 4 (inter- 
ruption) involving stress phases of the interview. 
Through the interviewer’s varied behavior during 
these five periods, the standardized interview presents 


1 This investigation was supported by a research 
grant (M-1107) from the National Institute of 
Mental Health, of the National Institutes of Health, 
U. S. Public Health Service. 

2 The data of this study were collected while all 
three authors were at the Washington University 
School of Medicine. 


to an interviewee a sequence of different behaviorial 
situations designed to elicit his characteristic inter- 
action patterns in what can be thought of as vary- 
ing miniature interpersonal situations. It should be 
noted that the stress of Periods 2 and 4 is provided 
by the overt behavior (“not responding” and “inter- 
rupting,” respectively) of the interviewer, and not 
by the “content” of what he says to the interviewee 
The durations of Periods 1, 3, and 5 are fixed (10, 
5, and 5 minutes, respectively); whereas the dura- 
tions of Periods 2 and 4 are variable (12 failures to 
respond or 15 minutes, whichever is shorter, and 12 
interruptions or 15 minutes, whichever is shorter, re- 
spectively). 


Detailed definitions of the interaction vari- 
ables investigated in this and the previous 
study can be found in our earlier reports (4, 
5). Briefly they are: (a) A’s Units, the fre- 
quency of the patient’s actions; (5) Action, 
the average duration of A’s actions; (c) 
Silence, the average duration of A’s silences; 
(d) Tempo, the average duration of each ac- 
tion plus its following inaction as a single 
measure; (e) Activity, the average duration 
of each action minus its following inaction as 
a single measure; (f/f) A’s Adjustment, the 
average duration during which A “inter- 
rupted” B minus the duration during which 
A “failed to respond to” B; (g) B’s Adjust- 
ment, the average duration of the other (usu- 
ally the interviewer) person’s adjustment; (/) 
Initiative, the relative frequency with which 
A initiated to B following a double silence, 
as in Period 2; (i) Dominance, the relative 
frequency with which A dominates B in a 
double action (interruption), as in Period 4; 
(j) A’s Synchronization, the frequency with 
which A failed to synchronize with B either 
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by failing to respond to B or by interrupting 
B; (&) B’s Synchronization, the frequency 
of the other (usually the interviewer) per- 
son’s failure to synchronize with A; and (/) 
B’s Units, the frequency of the interviewer’s 
actions. 

Since the reliabilities (stability coefficients) 
obtained for these variables in the previous 
study were unusually high for complex psy- 
chological variables such as one finds in the 
interview, it was decided to attempt to cross- 
validate the findings. The present study is an 
exact replication of the earlier one (5). 
Twenty additional outpatient Ss were inter- 
viewed twice, independently and in counter- 
balanced order by the same two interviewers 
who participated in the previous study (one 
a young internist, the other an older psy- 
chiatrist). All features of the experiment 
were identical: the rooms and the observer 
were the same; the Ss were all white; they 
ranged in age from 18 to 62 (median age of 
33); and they had presented complaints 
typical of the first series of 20 patients. 
Whereas in the first study there were 11 men 
and 9 women, the present sex distribution 
was 8 men and 12 women. 


Results 


As discussed above, 20 minutes of the 
standardized interview are fixed and, depend- 
ing upon the subject’s behavior in Periods 2 
and 4, up to 30 more minutes, totaling 50 in 
all, are possible to complete the standardized 
interview. Table 1 contains an analysis of the 
actual mean length of the 20 pairs of inter- 
views. The data from the original series are 
included for purposes of comparison. The re- 
sults of the cross-validation study are nearly 
identical with those of the original; again 
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showing that despite individual patient varia- 
tion (shown equally with the two doctors), 
the interview lasts, on the average, approxi- 
mately 32 minutes. This is in spite of the fact 
that in the original series one patient inter- 
view lasted 50 minutes, while in the present 
series the longest interview lasted only 41 
minutes. The results shown in Table 1 also 
duplicate the finding of no doctor effect per 
se; and show again that there is no evidence 
of an order effect since the mean lengths of 
all first and all second interviews are identical. 
Interestingly, the earlier finding of a signifi- 
cant (p of .05) order effect for the Silence 
variable with the two doctors was not cross 
validated. The value of the covariance F was 
now found to be an insignificant .88 indicating 
that both doctors were eliciting comparable 
mean silence values (as well as values for the 
other variables) from the 20 patients across 
the total interview. Also, as in the original 
study, on replication the mean lengths of 
Period 2 (7.63 and 8.28 minutes) and Period 
4 (2.47 and 2.35 minutes) were almost 
identical for the 20 patients with the two 
interviewers. The value of rho (.752) for 
length of Period 2 for each of the 20 patients 
with the two doctors was significant at the 
.0O1 level, whereas the extremely limited range 
for Period 4 made the obtained value of rho 
(.008) of little practical use. 

Table 2 presents, for the 20 patients, the 
means, standard deviations, and ranges (for 
the total interview) of nine of the interaction 
chronograph variables. Values are given for 
each doctor, with Pearson coefficients of cor- 
relation (stability coefficients) shown in the 
second column from the last. Pearson coef- 
ficients of correlation, shown in parentheses, 
are reproduced from the original study for 


Table 1 


Analysis of Mean Length of Total Interview in Minutes for Two Series of 20 Patients Interviewed Twice 











Original Series Replication Series 








Analysis Mean Range Mean Range 
Length of all 40 interviews 32.8 25.7 to 50.3 31.5 24.5 to 41.1 
a. Length of all first interviews 32.9 26.9 to 41.4 31.3 24.5 to 40.4 
b. Length of all second interviews 32.8 25.7 to 50.3 31.6 25.6 to 41.1 
c. Length of all first doctor’s interviews 33.5 26.9 to 50.3 30.9 24.5 to 40.4 
d. Length of all second doctor’s interviews 32.2 25.7 to 41.2 32.0 25.6 to 41.1 
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Table 2 
Means, Standard Deviations, Ranges, and Coefficients of Correlation Across 
Total Interview for Major I-C Variables 
Variable Dr. 1 Dr. 2 r t 
1. Mean Pt.’s Units 68.30 76.65 92 01 
SD 25.95 26.69 747)t 
Range 39 to 132 41 to 133 
2. Mean Pt.’s Action* 54.65 41.75 786 
SD 36.06 23.83 956 
Range 12 to 130 11 to 109 
3. Mean Pt.’s Silence 9.55 9.10 44 01 
SD 3.16 2.26 159 
Range 5 to 19 6 to 15 
4. Mean Pt.’s Te oO 53.65 48.55 905 01 
SD 22.3. 19.61 x) 
Range 19 to 100 19 to 100 
5. Mean Pt.’s Activity 35.00 30.35 R91 
SD 23.35 20.28 930 
Rang 4 to 87 3 to 85 
6. Mean Pt.’s Adjustment 1.08 66 R53 1 
SD 1.43 1.29 802 
Rang + 50 to 5.64 RK to $95 
7. Mean Dr.’s Adjustment 2.24 1.93 749 
sp 1.21 69 7 
Range 77 to 4.82 55 to 2.9 
8. Mean Pt.’s Synchron 84 85 741 } 
SD O8 05 726 
Range 69 to .97 77 to .98 
9. Mean Dr.’s Units 58.75 63.45 909 01 
SD 26.16 24.94 772 
Range 28 to 119 32 to 115 
* Values for variables 2 through 7 are given in hundredths of a minute 
by 0.6. 
t Values in parentheses are reproduced from original study for comparison 
comparison. The values of r for the replica- In the original study, “Action” was the 
tion series, ranging from .741 to .926, again most stable variable with an r of .956: on 
show a striking stability in interaction varia- replication this value dropped slightly to 
bles from first to second interview for each of 786. On the other hand. “Pt.’s Units.” with 


the 20 patients. Furthermore, as can be seen 


an r of .926, was now the most stable variable 
Despite these slight (statistically nonsignifi- 
cant) differences, however, all reliability co- 
efficients on replication again were found to 
be significant at the .01 level of confidence 
Thus using predefined and partially 
ardized interview 


from the means, standard deviations, and 
ranges, this stability is absolute as well as 
relative. Thus, for example, the ranges for 
“Pt.’s Units” show that during the same 
approximately 32-minute standardized inter- 
view one S interacted 39 times with Doctor 1 
and 41 times with Doctor 2, whereas a second 
S interacted 132 times with Doctor 1 and 133 
times with Doctor 2. Similar comparisons can 
be made for each of the other variables shown 
in Table 2. 


stand- 
conditions, and taking the 
total interview as the sample, patients show 
markedly similar interaction 


patterns from 


one interviewer to the other, i.e., independent 


of which doctor is conducting the interview 
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Table 3 demonstrates that interaction pat- 
terns are invariant only as long as stimulus 
(interview) conditions remain the same across 
the two interviewers for the total interview. 
As any single interviewer varies his own be- 
havior within the interview—from one sub- 
period to another—there is a corresponding 
change in the means obtained by patients 
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with him for any one variable. Thus, for 
example, the average interviewee values for 
“Silence” with Doctor 1 for Periods 1 to 5, 
respectively, vary from 9.15 up to 15.05 and 
back down to 9.40 and down still further to 
5.65 and finally up again to 8.50 hundredths 
of a minute as Doctor 1’s own silence be 


havior varies (see “rules” in earlier report) 


Table 3 


Means and Standard Deviations for Two Interviewers for Each of the Major I-C Variables 


for Each Period in Replication Series 


Pt.’s Units 


Period Dr. 1 Dr. 2 Dr. 1 
I M 19.50 22.65 68.40 
SD 12.06 13.11 46.16 

II M 12.90 13.30 44.25 
SD a7 84 29.46 

Il M 9.95 11.85 87.40 
SD 6.79 6.44 87.30 

IV M 14.85 14.50 11.25 
SD 3.36 2.89 5.71 

\ M 11.10 14.35 62.50 
SD 6.13 6.52 49.35 

Total M 68.30 76.65 54.65 
SD 25.95 26.69 36.06 


Pt.’s Activity Pt.’s Adjust. 


Period Dr.1 Dr. 2 Dr. 1 Dr. 2 
I M 59.55 45.65 —1.16 —.90 
SD 45.15 31.12 1.49 1.68 
II M 29.75 36.25 —1.24 — .48 
SD 33.29 33.86 2.40 57 
III M 78.30 43.85 —1.56 —1.09 
SD 87.70 37.00 3.39 1.78 

IV M 5.70 3.30 —60 —A4 

5. 3.80 1.03 


34.90 
31.52 


— 1.04 
1.38 






30.35 — 1.08 


{ 
SD 23.35 20.28 1.43 1.29 





Pt.’s Action 


















Pt.’s Silence Pt.’s Tempo 


Dr. 2 Dr. 1 Dr. 2 Dr. 1 Dr. 2 
54.60 9.15 9.05 77.90 63.95 
31.32 2.22 2.44 47.29 31.64 
48.80 15.05 13.10 59.05 62.75 
31.97 9.06 4.50 28.05 30.01 
53.05 9 40 9.30 97.10 62.70 
37.52 4.21 2.90 87.19 38.04 
9.50 5.65 6.20 16.95 16.05 
2.38 1.28 2.25 6.30 2.40 
42.80 8.50 8.05 71.10 51.10 
31.44 1.80 2.22 48.56 31.55 
41.75 9.55 9.10 53.65 48.55 
23.83 3.16 2.26 22.33 19.61 


Dr.’s Adjust Pt.’s Synchro Dr.’s Units 


Dr. 1 Dr. 2 Dr.1 Dr.2 Dr. 1 Dr. 2 
~ 39 ~ 38 99 «1.00 18.85 24.40 
25 18 03 .00 11.56 12.17 
—11.15 10.50 .26 .26 3.40 3.35 
4.96 3.63 18 12 2.31 1.62 

— 52 44 1.01 1.00 990 11.10 
26 14 .06 03 6.80 5.73 

1.38 1.28 29 1.00 15.55 14.05 


03 04 3.43 
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Stability of Interaction Patterns: Replication 


Table 4 


Correlations for Two Interviewers for Each of the Major I-C Variables 
for Each Period in Replication Series 


Pt.’s Units Pt.’s Action Pt.’s Silence Pt.’s Tempo Pt.’s Activit 
Period r rho r rho r rho r rho r rho 
I 909 870 711 870 618 574 706 870 718 792 
II 201 474 828 765 613 732 857 746 776 651 
Ill 788 697 428 655 481 Al 407 650 $55 630 
IV 578 430 .676 617 233 179 215 270 728 784 
V 823 817 662 821 555 484 666 824 658 776 
Total 926 917 786 914 744 828 905 41 891 10 
Pt.’s Adjust Dr.’s Adjust Pt.’s Synchron Dr.’s I 
Period r rho r rho r rho r } 
I 713 733 254 525 000 850 909 ‘42 
II 051 375 768 308 537 591 537 594 
Il 785 565 129 023 - 027 671 783 700 
IV 307 227 576 591 161 546 503 395 
V 470 306 510 614 000 793 807 830 
Total 853 821 749 .780 741 717 909 
Note.—Forr = .444, p < .0S; forr 561, p < O1. For rho = 427, p < .05; for rho = 543, p 01 


from responding quickly (within one second 
of the patient’s last utterance) in Period 1 
to not responding for 15 seconds in Period 
2, to responding within one second in Period 
3, to interrupting the patient in Period 4, 
and to responding within one second again in 
Period 5. A similar pattern is evident for 
Doctor 2. The value for F across these 5 
correlated means (3, pp. 284-296) was 14.80 
for Doctor 1 and 23.44 for Doctor 2, both 
significant at the .001 level of confidence. 
Examination reveals that “Action” as a pa- 
tient interaction variable is also a function 
of what the interviewer is doing during a 
particular phase of the interview. Thus when 
the interviewer (Doctor 1, for example) was 
conducting a free give-and-take interview in 
Periods 1, 3, and 5, the patients’ average 
duration of utterance (‘‘Action’’) was 68.40, 
87.40, and 62.50, whereas while he was inter- 
rupting them in Period 4, the patients’ aver- 
age action fell dramatically to a value of 





11.25 hundredths of a minute. Interviewe 

silence in Period 2 likewise was associated 
with a slight drop in patient action to an 
average value of 44.25. A similar statistically 
significant (p of .001 in both cases) patterr 
is evident for patients’ “Action” with Doctor 
2. It should be pointed out that we are here 
dealing with averages,® since Chapple divides 
all patient variables in a given period by the 
number of patient units in that period, thus 


8 Analysis of the individual records of the 20 pa- 
tients reveals that, independent of each patient’s 


own base line, all 20 showed the drop in “Action’ 
from Period 3 to Period 4 with a subsequent in- 
crease in Period 5. This pattern was true for all 2 
patients with Doctor 1 as well as Doctor 2. The 
picture is less striking for increased “Silence” in 
Period 2 relative to Periods 1 and 3; with Doctor 1 
only 15 out of the 20 showed the group pattern 
shown in Table 3. Interestingly, the same 5 patients 
(and only these) again showed this deviation fron 
the group trend with Doctor 2, thereby indicating 
the “stability” of this pattern despite this deviation 
from the group average. 
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controlling for the differences in the lengths 
of the five periods, as well as controlling for 


individual differences in interaction rates 
among patients. 

The values shown in Table 3 are thus a 
replication of the earlier finding that means 
of interaction patterns of patients, while in- 
variant under identical interinterviewer con- 
ditions across the interview as a whole (Table 
2), are markedly susceptible to changes in 
the stimulus conditions (presented as intra- 
interview changes from Periods 1 to 5 in the 
behavior of a single interviewer). Thus Table 
3 shows that as any single interviewer changes 
his behavior from one period to another there 
are associated changes in interviewee inter- 
action patterns; whereas Table 2 shows that 
a second interviewer can elicit almost identi- 
cal interviewee patterns (invariance) during 
this total interview providing he changes his 
behavior in each of the five periods in a 
manner similar to the first interviewer (i.e., 
providing both interviewers present identical 
stimulus conditions to the patient during the 
32-minute, on the average, total sample). 

The extent to which the mean patient in- 
teraction values shown in Table 3 for a single 
doctor are reliable for any one period, in con- 
trast to the interview as a whole, can be seen 
in Table 4. In the original study, 37 of the 45 
period-variable correlations (9 variables times 
5 periods) were statistically significant; 27 
at the .01 level and 10 at the .05 level. 
Most of the remaining 8 represented arti- 
facts in measurement; due most frequently 
to extremely narrow ranges or limited fre- 
quencies, and occasionally to skewing. On 
replication, as seen in Table 4, 32 of the 45 
period-variable combinations have statistically 
significant Pearson r’s; 24 at the .01 level 
and 8 at the .05 level. Values for the rank- 
order correlation, rho, are also shown for 
each variable in Table 4. Use of this statistic 
yields 38 significant values out of the total 
of 45, with 34 of the 38 reaching the .01 level 
of confidence. The advantage of rho with 
these data is shown clearly for the variable 
“Pt.’s Synchronization” where, due to a lim- 
ited range, four out of the five period-variables 
fail to reach significance when r is used and, 
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in contrast, all five reach the .01 level when 
rho is used. In some cases, then, rho is a 
better statistic for these data. We have not 
used it routinely only because, unlike r, it is 
a terminal statistic and does not require 
computation of measures such as means and 
variances which can be used in further analy- 
ses. The values of rho for the total interview 
are given at the bottom of Table 4 and these 
can be compared with the values of r copied 
from Table 2 for the same variables. How- 
ever, whether 7 or rho is used, it is well 
to remember that the correlations shown in 
Table 4 are based on only a small fraction of 
the total interview sample (e.g., Period 3 
correlation values represent only 5 minutes 
of the total 32-minute interview). One is 
therefore all the more struck by the similarity 
or stability of interaction patterns exhibited 
by the 20 patients with the 2 interviewers. 
Finally Table 5 shows the stability coef- 
ficients for the “Initiative’’ (Period 2) and 
“Dominance” (Period 4) variables in both 
the original and present studies. These are 
values for 7; values for rho are .556 and .713 
for Initiative and Dominance, respectively, 
both significant at the .01 level. Although on 
cross validation there is a reversal in their 
relative magnitudes, both variables show con- 
siderable stability.* When one remembers that 
Period 2 (silence stress) lasted only 7 to 8 
minutes on the average, and Period 4 (in- 
terruption stress) only slightly over 2 minutes, 
these stability coefficients are quite striking. 
The degree of invariance of a single indi- 
vidual’s “initiative” or “ascendance-submis- 
sion” patterns could be investigated in greater 
detail by extending in a separate study the 
lengths of these two periods to perhaps 15 
minutes each. It is reasonable to assume that 
a larger sample of these behaviors would be 
associated with a higher stability coefficient. 
Before leaving Table 5, it should be noted 
again that, on the average, patients take the 
initiative more than they have to be initiated 


*Our earlier suggestion that the reliability of 
Dominance, which was lower than Initiative in the 
first study, might be a function of the “faster pace” 
of Period 4 relative to Period 2 is not borne out by 
this reversal in the relative magnitudes of r for these 
2 variables. 
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Stability of Interaction Patterns: Replication 


Table 5 






Means, Standard Deviations, Ranges, and Coefficients of Correlation for Two Interviewers for Initiative 
(Period II) and Dominance (Period IV) in Original and Replication Series 


Original Series 


Variable Dr. 1 Dr. 2 
1. Mean Initiative 75 77 
SD 18 19 
Range 17 to .94 23 to 1.00 
2. Mean Dominance 32 — 42 
SD 31 30 
Range 71 to +.83 -.92 to + .29 


* Significant at the .05 level of probability 
** Significant at the .01 level of probability 


to (means of + .72 and + .73) under condi- 
tions of silence, and submit more than they 
dominate (— .36 and — .43) under conditions 
of interruption. The fairly large standard 
deviations for each of these variables indicate 
that these means are only group trends and 
that individuals vary considerably in these 
two dimensions of their interaction patterns. 


Discussion 


The results of this cross validation would 
indicate that interaction patterns such as the 
ones we have studied, as one dimension. of 
personality, are both modifiable by planned 
changes in intra-interviewer behavior and re- 
markably stable from one interviewer to an- 
other in interviews conducted on the same 
day. This finding may be a function of many 
factors, one of which may be the “set” of the 
patient on the particular day he has his two 
interviews. If this is true, one would expect 
the patient’s interaction pattern possibly to 
be different if the two interviews were sepa- 
rated in time. We have completed a study, 
again on 20 subjects, in which a single inter- 
viewer interviewed the same patient on two 
occasions one week apart. The results of this 
study, now being analyzed, should provide 
us with information which will enable us to 
decide the usefulness of the interaction chron- 
ograph as an instrument for measuring changes 
of behavior in time (for example, as a func- 
tion of interpolated psychotherapy, etc.). 





Replication Series 


r Dr. 1 Dr. 2 r 
805** 72 73 552° 
18 14 
25 to .92 42 to .92 
470° 36 43 (97** 
31 26 


1.00 to + 13 92 to + 00 


However, the results of the two studies al 
ready completed seem to indicate that the 
interaction chronograph has promise as an 
instrument for assessing changes in behavior 
during a single day (test-retest changes in 
behavior as a function of the administration 
of a drug, for example). 

Another factor which theoretical 
well as practical importance is the effect of 
the content of the interview on the interaction 
patterns. We have been working on a method 
of content analysis and hope soon to be able 
to compare, for these same interviews, what 
the interviewee said with how he said it, ie 
“content” vs. interaction chronograph pat 


is of 


tern. In this manner we plan to investigate 
Chapple’s hypothesis that interaction 
terns are relatively independent of content 


pat- 


Summary 


The present investigation, a replication of 
an earlier one, was a study of the interaction 
patterns of 20 patients who were interviewed 
independently by two interviewers on the 
same day. The demonstrated cross validation 
of our earlier findings indicates that a single 
patient’s interaction pattern is both sus- 
ceptible to planned changes in a single inter- 
viewer’s behavior (intra-interviewer modifi- 
ability) and that this pattern is remarkably 
stable, or invariant, from one interviewer to 
another (interinterviewer stability). Research 
currently in progress is designed to investigate 
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further aspects of this time dimension of in- 
teraction patterns. 
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The Relation of Brain Injury and Visual Perception 
to Block Design Rotation 


Harold L. Williams, Ardie Lubin, Charles Gieseking 
and Irvin Rubinstein 


Walter Reed Army Institute of Research 


This report concerns the phenomenon called 
rotation which sometimes occurs on block- 
design tests or other visual-motor tasks where 
designs have to be reproduced. The purpose 
of the investigation was to repeat and extend 
the studies of rotation reported by Shapiro 
(1, 2, 3) and Yates (4). In a recent series 
of papers they discussed the effects on rota- 
tion of such factors as changes in geometric 
properties of the target, blocking the sub- 
ject’s (S’s) peripheral vision, intelligence of 
the S, and brain injury. 

In the Block Design Rotation Test 
(BDRT), which was devised by Shapiro, the 
S’s task is to reproduce a blue and yellow de- 
sign, one inch square, painted on a white card 
six inches square. Each design requires four 
blocks taken from the Wechsler Bellevue 
Block-Design subtest. Rotation (R) is meas- 
ured as the number of degrees which the S’s 
reproduced design differs in orientation from 
that of the target design. These scores are 
summed over all forty cards. 

The BDRT was constructed especially to 
test the assumed relation between rotation 
and certain geometric properties of the target 
designs. These properties arise from the na- 
ture of the ground, the figure, and the type 
of symmetry involved. The ground—the white 
card on which the design is painted—is pre- 
sented either as a square or a diamond. The 
figure, or design, is also presented either as a 
square or diamond. The line of symmetry is 
an imaginary line which cuts the design into 
mirror images. Two kinds of symmetry are 
possible: rectanglar, in which the line of sym- 
metry is vertical or horizontal, and diagonal, 
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in which the line of symmetry is 45 degrees 
from vertical. In Figure 1a, for example, the 
target has a square ground, square figure, and 
diagonal symmetry. In Fig. 15, it has square 
ground, diamond figure, and rectangular sym- 
metry. 

Over the 40 cards with their 10 basic de- 
signs, all possible combinations of symmetry, 
figure, and ground occur an equal number of 
times. They are presented in a balanced or- 
der developed by Joan May (2, p. 617) 

Shapiro’s findings concerning the geometri 
properties of the target can be summarized 
as follows: (a) diagonal symmetry produces 
more rotation than rectangular symmetry 
(b) diamond figure produces more rotation 
than square figure; (c) diamond ground pro- 
duces more rotation than square ground; (d) 
the order of effect of the three properties is 
symmetry, figure, and ground; and (e) the 
effects of symmetry, figure, and ground are 
additive. 

Shapiro, and later Yates, found that brain- 
injured Ss rotated significantly more than 
control Ss. About 75 per cent correct classifi- 
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Fig. 1. Two possible combinations of symmetry, 
figure, and ground 
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cation can be obtained with their data when 
R is used as a predictor variable. 

With the finding that the factors of sym- 
metry, figure orientation and ground orienta- 
tion appear to have directional properties 
which affect the amount of rotation, Shapiro 
developed a theory to account for the rela- 
tionship between brain injury and rotation. 
He argued that if one could assume that the 
brain-injured Ss tended to focus on the cen- 
tral part of the target at the expense of the 
periphery, then directional properties of the 
target would presumably exert a more power- 
ful influence toward rotation than the direc- 
tional properties of the more peripheral back- 
ground. He then deduced that non-brain-in- 
jured Ss, deprived of peripheral vision, would 
rotate as much as brain-injured Ss, and that 
they would respond to the same geometric 
properties of the target (symmetry, figure, 
and ground). To test this deduction he fash- 
ioned a mask (field reducer) which covered 
one eye but permitted vision with the other 
eye through an aperture 6 mm. in diameter. 
Control Ss wearing the field reducer and 
working on a plain black felt surface rotated 
as much as brain-injured Ss and responded 
to the same geometric properties of the target, 
thus confirming the hypothesis. 

Three main problems are considered here: 
(a) the effect of type of symmetry, orienta- 
tion of figure, and orientation of ground on 
rotation; (5) the effect of blocking peripheral 
vision when Shapiro’s experiment is extended 
and brain-injured Ss as well as controls work 
with a field reducer; and (c) the effect of 
differences in intelligence (a variable which 
was studied by Shapiro and Yates but which 
they do not appear to regard as a major ex- 
planatory one). 

In the present study a group of brain-in- 
jured patients was again compared with a 
group of non-brain-injured controls, and with 
a control group wearing a field reducer. An- 
other group of brain-injured Ss wearing the 
field reducer was added in order to assess the 
effects of the field reducer in conjunction with 
brain injury. A group of control Ss with low 
intelligence was added in order to explore 
more thoroughly the effects of intelligence on 
rotation. Thus the following five groups of 
20 Ss each are involved: 














Gieseking, and I. Rubinstein 


Brain-injured without field reducer (BG). 

Brain-injured with field reducer (BF). 

Controls without field reducer (CG). 

Controls with field reducer (CF). 

Low-intelligence controls without field re- 
ducer (CD). 


unt wnhr 


Subjects and Procedure 
Selection of Subjects 


Forty male brain-injured patients were se- 
lected from the Neurology and Neurosurgery 
Services at Walter Reed Army Hospital. The 
brain-injured sample was heterogeneous with 
respect to diagnosis. There were 18 head-in- 
jury cases, 13 patients with neoplasms or 
vascular anomalies, and 9 with diffuse brain 
pathology of various kinds. The Ss can be 
described as young males, in good health 
prior to illness, who were examined early in 
recuperation from injury or surgery. In gen- 
eral their brain injuries were acute rather 
than chronic, and diffuse rather than focal.’ 


Sixty male controls were obtained from among 
non-brain-injured medical patients. They were het- 
erogeneous with respect to diagnosis, but were am- 
bulatory, and not acutely ill. They had exhibited no 
neurological symptoms on a routine examination at 
admission. Twenty of these controls were selected 
on the basis of low intelligence. Their scores on 
Aptitude Area I of the Army Classification Battery 
were at or below the thirty-second percentile.” 

Not all Ss were able to complete all designs well 
enough for rotation to be measured. If Ss were un- 
able to complete at least three of the five designs in 
one of the subsets used to test the effects of figure, 
ground, and symmetry, they were dropped from the 
study. There were four such Ss in each of the first 
four subgroups. Sampling was continued until 20 Ss 
per group were obtained. It has not been possible to 
evaluate the biasing effect of dropping these Ss. 

The field reducer was assigned randomly to 20 of 
the brain-injured group and to 20 of the controls. 
Since the effect of eliminating peripheral cues (in- 
dependent of the working surface) was being tested, 
all Ss worked on a black felt cloth spread over the 
table top. 








1 Our thanks are due Lt. Colonel James F. Ham- 
mill, Chief, Neurology Service, Walter Reed Army 
Hospital, who made the final diagnostic decision on 
each case. 

2 Aptitude Area I is based on scores on verbal type 
subtests which have been shown by Stone (Stone, H. 
Personal communication, 15 July 1955) to have sub- 
stantial correlations with Wechsler Bellevue Full- 
Scale IQ. The percentile corresponds roughly to an 
IQ of 80 to 85. 
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Brain Injury and Block Design Rotation 


Table 1 




















Ground Figure Symmetry 
Significance Significance Significance 
Group D> S* Level D> S* Level Di> Ret Level 
BF 14 NS 19 1% 14 NS 
BG 12 NS 15 5% 13 NS 
CF 10 NS 19 1% 15 5% 
CG 9 NS 19 1% 16 1% 
CD 15 5% 16 1% 17 1% 
Sum 60 5% 88 1% 75 1% 
*“D > S”" refers to the number of Ss whose total amount of rotation was greater on the 20 designs with diamond shapes 
than on the 20 designs with square shapes. 
+t “Di > Re” refers to the number of Ss whose total amount of rotation was greater for designs with diagonal symmetry 


than for designs with rectangular symmetry. 


Prior to administration of the BDRT, all Ss in 
the first four groups were given five subtests of the 
Wechsler Bellevue Adult Intelligence Scale, Form 1. 
These were Arithmetic, Block Design, Similarities, 
Vocabulary, and Comprehension. Because of time 
pressure, the CD group was given only Block De- 
sign and Arithmetic. 


Procedure 


The procedure was essentially the same as 
Shapiro’s. The cards were presented one by 
one, centered on the table from left to right, 
and with the edge or corner nearest the S 
placed 12 inches from his edge of the table. 
The S was seated as close to his edge as pos- 
sible and was told to construct the designs 
as close to the edge as he could. The experi- 
menter sat opposite the S. When the § fin- 
ished a design he was instructed to turn away 
and close his eyes. At this point, time was re- 
corded and the degrees of absolute rotation 
measured with a protractor. 


Results and Discussion 


By way of a summary statement, the fol- 
lowing points can be made. The results are in 
general but not complete agreement with 
Shapiro’s on the relationship between rotation 
and the various stimulus properties. The ef- 
fect of blocking peripheral vision in brain- 
injured Ss is not in line with Shapiro’s 
hypothesis, and differences in intelligence ap- 
pear to be more important than he assumed. 


Stimulus Properties and Rotation 


To measure the effect of each stimulus 
property, different subsets of problems were 





compared with one another. For example, in 
measuring the effect of ground, an S’s rota- 
tion score on the 20 cards with a diamond 
ground was compared with his score on the 
20 cards with a square ground. Then the 
number of Ss in each group was determined 
for whom the first score was higher than the 
second. 

Table 1 summarizes these results and pre- 
sents also the results of sign tests. The sign 
test was used rather than the usual ¢ test be- 
cause in all groups, R was distributed non- 
normally with a large positive skew. 

Table 1 shows that the diamond figure pro- 
duces more rotation than the square figure, 
and diagonal symmetry more than rectangular 
symmetry. These results are in agreement 
with Shapiro’s. The results on diamond vs. 
square ground, however, are somewhat equivo- 
cal (out of 100 Ss, 60 show a difference in 
the expected direction, but in only the CD 
group is the difference significant). The com- 
bined results are significant, however, at the 
5 per cent level.* 

When Shapiro assessed the relative strength 
of the geometric properties of the target, he 
found that, in general, diagonal symmetry 
was associated with more rotation than dia- 


8 For complete tables of raw data, details of pro- 
cedure, and description of the sample, as well as 
for discussion of certain differences between our pro- 
cedure and Shapiro’s, the reader is referred to Walter 
Reed Army Institute of Research Interim Report 
(in press) entitled “An Experimental Study of Block 
Design Rotation.” 
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mond figure, and that diamond ground was 
least effective. Table 1 indicates that, in the 
present study, figure tended to affect the 
amount of rotation more than symmetry for 
four of the five groups. The effect of ground, 
as Shapiro reported, tended to be weakest. 

What about the interaction of symmetry, 
figure, and ground? Shapiro reasoned that 
from the Gestalt viewpoint, incongruent com- 
binations should produce more rotation than 
congruent ones. He took the amount of rota- 
tion on trials within congruent figure-ground 
cards (ten designs with a square figure and 
square ground, plus ten designs with a dia- 
mond figure and diamond ground) and com- 
pared it with the amount of rotation on in- 
congruent figure-ground cards (ten designs 
with square ground and diamond figure, plus 
ten designs with diamond ground and square 
figure). If the effects of ground and figure 
were additive, then there should have been 
no difference between the two sets. There 
were no over-all significant differences in his 
data although there was a slight tendency 
for incongruence to produce more rotation. 
Shapiro’s analysis was repeated for this sam- 
ple, and Table 2 shows the results. It can be 
seen that there is a tendency for incongruent 
(7) figure-ground designs to produce more 
rotation than the congruent (C) figure-ground 
designs. This tendency appears in each of 
the groups, and the combined results are sig- 
nificant at the 1 per cent level. The other 
two interactions do not reach a significant 
level in any group nor in the combined re- 
sults. 


H. L. Williams, A. Lubin, C. 


Gieseking, and I. Rubinstein 


The present results, then, concerning the 
relation of rotation to the geometric aspects 
of the target designs differ from Shapiro’s 
in only two respects: (a) figure is found to 
be more powerful than symmetry, and (5) 
figure-ground incongruence produces signifi- 
cantly more rotation than figure-ground con- 
gruence. Thus there appear to be not only 
independent effects of symmetry, figure, and 
ground, but also a Gestalt interaction be- 
tween figure and ground. 


The Effect of Blocking Peripheral Vision 


In line with Shapiro’s results, the present 
control group showed more rotation with the 
field reducer than without it (p < .01). In 
disagreement with the extension of Shapiro’s 
hypothesis, rotation by brain-injured with 
the field reducer is significantly Jess than 
without the field reducer (p < .01). 

The scores on which these comparisons are 
based are given in the first two lines of Table 
3, which shows the mean and variance of ro- 
tation scores (M) for each of the five groups. 
(The logarithmic function M= 100 [logR 
— 1] was used in place of R to reduce the 
very large positive skew.) 

It will be recalled that Shapiro suggested 
that the brain-injured tend to disregard the 
periphery. If this were so, then blocking the 
visual periphery should make little difference 
to the patient, and one would expect about 
the same amount of rotation in the BF group 
as in the BG group. The fact that the BF 
group shows less rotation than the BG sug- 
gests that the field reducer may prevent pe- 


Table 2 


Interaction of Symmetry, Figure and Ground 














Figure and Ground 





Figure and Symmetry Symmetry and Ground 














Significance Significance Significance 
Group I>C Level r>'C* Level I>c Level 
BF 15 5% 10 NS 8 NS 
BG 11 NS 10 NS 12 NS 
CF 13 NS 8 NS 12 NS 
CG 13 NS 10 NS 11 NS 
CD 14 NS 7 NS 8 NS 
Sum 66 1% 45 NS 51 NS 





*“T > (C” refers to the number of Ss whose total amount of rotation was greater on the 20 incongruent designs than it was 
on the 20 congruent designs. 
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Brain Injury and Block Design Rotation 


Table 3 
Rotation Scores (M) and Their Relation to Scores (7°) on Wechsler Subtests 











Statistic BF BG CF CG CD Total 
N 20 20 20 20 20 100 
M 131.25 165.55 144.85 115.95 168.80 145.28 
sm? 1,271.78 1,114.89 1,352.13 807.21 1,355.54 1,540.97 
T 26.50 23.20 27.25 30.45 20.95 25.67 
sr? 111.95 122.90 92.62 69.73 39.63 91.90 
‘MT — 56 —.71 — 53 — 38 — 51 — 60 
M-Mr — 12.37 15.34 2.72 —19.79 14.10 


ripheral stimuli from distracting the brain- 
injured S, thus enabling him to concentrate 
more effectively on the target. In place of 
Shapiro’s “inattention” hypothesis, one would 
then substitute the notion that the periphery 
of brain-injured Ss provides distorted and 
confusing information about the visual frame 
of reference. 


The Effect of Intelligence 


How well can brain-injured Ss be discrimi- 
nated from controls? As Table 3 shows, the 
rotation scores in the BG group are much 
higher than those in the CG group. (The dif- 
ference is significant beyond the 1 per cent 
level.) If the M score of 136 is used as a cut- 
off, 75 per cent of the Ss in these two groups 
can be classified correctly. However, Table 3 
also indicates that there is a substantial nega- 
tive correlation between the M score and a 
score (J) where T equals the sum of Wechs- 
ler Block Design and Arithmetic. Yates also 
reported significant negative correlations be- 
tween intelligence and rotation. Since the CG 
group has higher intelligence scores than the 
BG group, perhaps not all of the 75 per cent 
correct classification was due to brain injury 
per se. This was the reason for adding the 
dull (CD) group. In fact, when CD is con- 
sidered part of the control group there is no 
cutoff M score that will produce usable dis- 
crimination between brain-injured and con- 
trols. This finding raises the question whether 
all of the difference between the rotation av- 
erages for the five groups may not be due en- 
tirely to differences in intelligence. To test 
this, an analysis of covariance of M was 
made, holding T constant. That is, each S’s 
M score was statistically equated for the ef- 





fect of intelligence. The results showed that 
the rotation means, even when corrected for 
intelligence, differed significantly beyond the 
1 per cent level. 

In Table 3 the line labeled M — Mr gives 
the rotation means corrected for the effects 
of intelligence scores. An immediate contra- 
diction appears. The CD group has a signifi- 
cantly higher average rotation score than the 
CG group even after correction for intelli- 
gence. Since these groups were intended to 
differ only with respect to intelligence, this 
suggests that some unknown sampling bias 
toward higher rotation in the CD group may 
have existed. 

So far, then, it appears that rotation is af- 
fected somewhat by the geometric properties 
of the target card, and a great deal by the 
intellectual functioning of the S. It is diffi- 
cult to conclude from the data what the di- 
rect effect of brain injury is on rotation, since 
there appears to be an interaction of brain 
injury with the effect of eliminating periph- 
eral cues. 


Summary 


The Block Design Rotation Test was given 
to five groups (with 20 Ss in each): (a) brain- 
injured patients who wore a field reducer 
(i.e., eye-mask that restricted peripheral vi- 
sion); (0) brain-injured patients without 
field reducer; (c) non-brain-injured patients 
without field reducer; (d) non-brain-injured 
patients with field reducer; and (e¢) dull nor- 
mal patients without field reducer. Most of 
the relationships given by Shapiro and Yates 
were confirmed: 


1. A diagonal line of symmetry in the 
target design produced more rotation in the 
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S’s reproduced design than a rectangular 
(horizontal or vertical) line of symmetry. 

2. When the target design was presented 
as a diamond figure, there was more rotation 
than when it was presented as a square figure. 

3. When the target design was presented 
on a diamond ground, there was more rota- 
tion than when it was presented on a square 
ground. 

4. Both figure and symmetry were more 
powerful than ground in producing rotation. 

5. There was a substantial negative corre- 
lation between rotation and various subtests 
of the Wechsler Bellevue. 

The present findings differ from Shapiro’s 
in certain respects: 


1. When intelligence scores were held con- 
stant, the difference in rotation for brain-in- 
jured and controls remained statistically sig- 
nificant but was not great enough to be used 
for classification. 

2. There was a slight but significant Ge- 
stalt interaction between figure and ground. 
Rotation is not only a function of the sepa- 
rate effects of figure, symmetry and ground; 


H. L. Williams, A. Lubin, C. 


Gieseking, and I, Rubinstein 


it is also increased when figure and ground 
are incongruent. 

3. Reducing peripheral vision increased ro- 
tation in controls, and appeared to decrease 
rotation in the brain-injured. This finding 
suggests that whereas the normal S probably 
uses peripheral cues as guides to correct ori- 
entation of his designs, the brain-injured S 
may be confused and distracted by them. 
Masking his visual periphery assists rather 
than hinders him in orienting his design. 


Received October 10, 1955. 
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A Second Study of Psychological Changes During the 
First Year Following Prefrontal Lobotomy’ 


John F. Winne and Isidor W. Scherer 
VA Hospital, Northampton, Mass. 


Scherer and his associates have published 
two reports (11, 12) dealing with the changes 
in a sample of schizophrenic patients foliow- 
ing prefrontal lobotomy. The present paper 
reports the results of a similar experiment 
utilizing an independent sample. 

Earlier studies of the effect of prefrontal 
lobotomy on psychological test scores (1, 2, 
4, 7, 9) have yielded conflicting results. 
Scherer et al. (12) suggested that part of this 
discrepancy might be due to differences in the 
length of the interval between pre- and post- 
testing. Other factors, cited from a review of 
the literature, which might lead to conflicting 
results included lack of control subjects, het- 
erogeneity of the sample populations under 
study, variations in the type of operation, and 
failure to integrate test findings with clinical 
data. By controlling these factors in an ap- 
proach which involved successive verification 
of hypotheses (12, pp. 2-5) Scherer et al. 
demonstrated presumably reliable changes in 
73 out of a total of 118 test measures. 

The results of the original experiment ap- 
peared to support four generalized hypotheses 
which can be stated briefly as follows: (a) 
Immediately following lobotomy there is a 


1From the Veterans Administration Hospital, 
Northampton, Massachusetts. The authors wish to 
thank Drs. Robert L. Fortier and Frank Politzer, 
former postdoctoral trainees at this station, for as- 
sistance in testing patients; Dr. Robert W. Baker, 
Department of Psychology, Clark University, Lionel 
M. Ives, M.D., Chief, Professional Services, and 
Drs. Cesareo D. Pefia, Arthur S. Tamkin, and Ar- 
nold Trehub, Staff Clinical Psychologists, for judg- 
ing qualitative variables and critically reviewing this 
report; Messrs. Donald Isaac, Herbert Nickel, Jo- 
seph Mayer, and Alexander Stern, trainee Clinical 
Psychologists, for assistance in scoring quantitative 
measures and recording data. 


general decline in mental efficiency reflected 
in lowered test scores at two weeks following 
lobotomy. Full recovery from the decline ap- 
pears to take place by three months follow- 
ing lobotomy. At one year after the opera- 
tion, a trend toward reduced efficiency is again 
noted. (6) Tests measuring strength of ego 
boundaries show a steady increase throughout 
the first postoperative year. (c) There ap- 
pears to be greater sexual awareness and con- 
cern throughout all intervals during the first 
year following lobotomy. (¢) Measures of 
rate of action and impulsivity suggest in- 
creased speed of reaction up to three months 
after lobotomy followed by a trend toward 
slower speed. 

Certain methodological shortcomings in the 
original study, however, cast some doubts on 
these findings. First, since not all subjects 
could be tested on all measures at each oc- 
casion—either because of lack of cooperation 
or unavailability of the subjects due to ill- 
ness, home visits, etc——the sample utilized 
for the various comparisons of any single 
measure differed from one time interval to 
another. Second, it was found impossible to 
match experimental and control patients for 
preoperative performance on each test, al- 
though the groups were fairly well equated 
for such background data as age, education, 
length of hospitalization, level of cooperation, 
diagnostic classification, and certain behav- 
ioral characteristics (talkativeness, assaultive- 
ness, hallucinatory experiences, etc.). Third, 
it has been demonstrated (8, 15) that if one 
gives a large battery of tests to two groups of 
people and then examines the data in order 
to find differences between tlie groups, a cer- 
tain number of differences will be expected 
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Table i 
Age, Chronicity, and Diagnostic Groups for the Lobotomized and Control Patients 





in the First and the Second Experiment 





Operated Patients 





Control Patients 











Variable First 
Number of Cases 22 
Age, in years 33.9 (SE 1.3) 
Length of Hospitalization, 
in years 4.1 (SE 0.5) 


Diagnostic Group 





32.6 (SE 1.6) 





Second First Second 


15 Nish. 





os 
304 (SE 1.0) 37.1 (SE 29) 





Catatonic 36.4% 
Paranoid 22.7% 
Hebephrenic 18.2% 
Mixed 22.7% 


6.2 (SE 1.7) 3.3 (SE 0.4) 7.9 (SE 2.4) 
53.3% 36.4% 50.0% 
13.3% 18.2% 20.0% 
20.0% 9.1% 20.0% 


13.3% 36.4% 10.0% 





to arise by the operation of chance factors 
alone. Thus, for example, of the 11 changes 
(each significant at the .10 level) in the di- 
rection of decreased mental efficiency two 
weeks following lobotomy, which Scherer re- 
ports in his Table 2 (12, p. 11), approxi- 
mately four of them might be chance rela- 
tionships. 

For these reasons, the entire study was re- 
peated on a different sample. The present 
study cannot be considered a replication of 
the original experiment (12) since important 
variations, to be discussed below, were made 
in the experimental procedure. However, it 
was thought that if the effects of lobotomy 
are fundamental and far-reaching enough to 
show up in spite of these variations, this in- 
formation would certainly be useful. 


Subjects 


From a pool of 70 patients utilized in the 
Veterans Administration Central Office Re- 
search Project on Prefrontal Lobotomy (3), 
43 subjects were sufficiently cooperative and 
willing to receive at least partial pretesting 
with the Northampton lobotomy battery. Of 
these, only 15 lobotomized and 10 control 
subjects were testable to any degree through- 
out the first year. These 25 individuals were 
selected as subjects for the study reported in 
this paper. The 15 lobotomized and 10 con- 
trol patients were fairly well matched for age, 
diagnosis, chronicity, length of hospitaliza- 
tion, and a prognostic index based on past 


history and symptomatology present at the 
time of operation.? Table 1 gives the age, 
diagnostic classification, and chronicity for 
these patients as well as for those utilized in 
the first experiment. 


An inspection of Table 1 indicates that although 
the first sample of operated patients was approxi- 
mately the same age as the second sample, they had 
been hospitalized for a considerably shorter period 
of time. The control samples, however, differ with 
respect to age as well as hospitalization, the second 
sample being drawn from a much more chronic 
population. The difference was not unpredictable 
since most of the potential control patients had been 
operated on before the start of the present experi- 
ment. The differences in age and length of hospitali- 
zation undoubtedly influenced the results of psycho- 
logical testing. 

A second difference between the two experiments 
is that of the motivation of the patient-subjects. In 
the first study, 22 lobotomized and 22 control pa- 
tients were given pretesting. Of these, 16 operated 
and 18 controls were willing and able to take the 
test battery one year later. In the second sample, 
however, much more attrition occurred. Of the 19 
lobotomized and 24 control patients originally tested, 
only 15 operated and 10 controls took any one of 
the entire series of tests. It was the impression of 
the examiners, moreover, although no specific data 
exist to bear this out, that the original control sam- 
ple appeared to be genuinely interested in taking the 
tests and that the second control sample was not. 

The third difference between the samples used in 
the two studies is found in the type of operation 
undergone by the lobotomized patients. The study 
reported in the Monograph (12) was based on 


2 The prognostic index is derived from Schedules 
B-1 and B-2 of the Veterans Administration Central 
Office Research Project on Prefrontal Lobotomy (3). 
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changes following an open Lyerly-Poppen approach 
while the present report dealt with changes follow- 
ing the McKenzie approach (nine patients) and the 
Scoville approach (six patients). The McKenzie ap- 
proach substitutes a McKenzie leucotome for the 
spatula used in the Lyerly-Poppen approach; Sco- 
ville’s bilateral cortical undercutting is a radically 
different operation. 

Since inspection of the data showed no gross dif- 
ferences in the effects of the McKenzie and Scoville 
operations, the results were pooled for the present 
study. However, statements of nursing personnel 
suggest that the present sample of patients, regard- 
less of the type of operation, showed much less im- 
mediate change in behavior in ward situations than 
did the patients in the first experiment. The dis- 
crepancy may be due, in part, to differences in the 
clinical status of the samples utilized in the two ex- 
periments, or, perhaps, to differences in the opera- 
tive technique (5). 


Procedure 


All operated patients were individually 
tested prior to, and two weeks, three months, 
and one year following lobotomy. Control pa- 
tients were individually tested at comparable 
intervals. This represents a further deviation 
from the procedure reported in the Mono- 
graph (12); in the earlier study, all patients 
were group-tested whenever possible. In the 
present study, however, since no more than 
two patients were operated on during any 
single week, group administration of tests was 
not possible. This probably influenced the re- 
sults of testing, as has been suggested by 
Scherer and others (10, 13) in spite of an at- 
tempt on the part of the examiners to be as 
little involved in the situation as possible. 


One hundred and six measures, derived from 27 
psychological tests, were utilized in the lobotomy 
test battery (cf. 12, pp. 6-10, 17). Specific empirical 
predictions were derived from an analysis of the 
statistically significant changes in the first study as 
to the direction of the net change in the operated 
group for each of the test measures for each time 
interval under consideration: (a) pretest—two weeks 
posttest, (b) pretest—three months posttest, (c) 
pretest—one year posttest, (d) two weeks posttest 
—three months posttest, (e) two weeks posttest— 
one year posttest, and (f) three months posttest— 
one year posttest. Each prediction was individually 
tested by ascertaining the significance of the net 
change in means using formula 7.25 of Walker and 
Lev (14, p. 157). Since the earlier study suggested 
that the variance for the two samples could not be 
presumed to be equal, the degrees of freedom were 
estimated by formula 7.26 (14, p. 158). Dichoto- 
mized data were tested by using McNemar’s net 
shift in proportion (6, pp. 81-82), the entries in the 


fourfold tables being the proportion of subjects pos- 
sessing the attribute in question. One-tailed tests 
were used throughout, with the level of acceptance 
being set in advance at $p = .0S. 


Results 


Two hundred and eleven specific predic- 
tions of net change for the operated group 
were empirically derived from the significant 
changes obtained in the original experiment 
(12).* Of these, only 15 were accepted at the 
.O5 level of confidence, an outcome not sig- 
nificantly different from chance (y* = 3.7027, 
df = 6, p> .50). The outcome for each of 
the six time intervals under consideration was 
similar, as can be seen from Table 2. Those 
measures for which no prediction could be 
made, because they yielded no significant dif- 
ferences in the original experiment, also 
showed chance distributions of significant net 
changes. In view of the negative outcome, it 
was not felt worth while to examine in detail 
the status of the four generalized hypotheses 
of the Monograph (12). These hypotheses 
cannot be supported by the present study. 

The conclusion reached from these results 
is that the two samples of operated patients 
utilized in this series of experiments were not 
affected in the same way. It seems that the 
hypothesized effects of lobotomy, as found in 
the first sample, are not so fundamental as 
to appear despite the changes in experimen- 
tal procedure. The original conclusions de- 
rived from the Monograph (12) should be in- 
terpreted with care, therefore, since it is quite 
possible that the original conclusions are spe- 
cific to the particular sample of patients un- 
der study and cannot be generalized to a uni- 
verse of lobotomized patients who may have 
undergone a variety of operative procedures. 


8 Tables giving, for each measure, the number of 
cases, means, and the predicted and obtained net 
change in the operated sample for the periods (a) 
pretest—two weeks posttest, (b) pretest—three 
months posttest, (c) pretest—one year posttest, (d) 
two weeks posttest—three months posttest, (¢) two 
weeks posttest—one year posttest, and (f) three 
months posttest—one year posttest, have been de- 
posited with the American Documentation Institute. 
Order Document No. 4851 from ADI Auxiliary Pub- 
lications Project, Photoduplication Service, Library 
of Congress, Washington 25, D. C., remitting in ad- 
vance $2.00 for microfilm or $3.75 for photocopies. 
Make checks payable to Chief, Photoduplication 
Service, Library of Congress. 
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Table 2 


Outcome of 211 Empirical Predictions as to the Effect of Prefrontal Lobotomy 








Total Accepted Rejected Chi Square 
Time Interval Predictions Predictions Predictions (df= 1) p 

Pre-~Two Weeks 

Expected 39.0 1.95 37.05 

Obtained 39.0 3.00 36.00 0.5950 > 25 
Pre-Three Months 

Expected 30.0 ; 28.50 

Obtained 30.0 1.00 29.00 0.1753 > 0 
Pre-One Year 

Expected 32.0 1.60 30,40 

Obtained 32.0 2.00 30.00 0.1052 > 50 
Two Weeks-Three Months 

Expected 42.0 2.10 39.90 _ 

Obtained 42.0 4.00 38.00 1.8094 > 10 
Two Weeks-One Year 

Expected 33.0 1.65 31.35 

Obtained 33.0 2.00 31.00 0.0781 > 15 
Three Months—One Year 

Expected 35.0 1.75 33.25 ~ 

Obtained 35.0 3.00 32.00 0.9397 > 25 
Total 

Expected 211.0 10.55 200.45 3.7027 >.50 

Obtained 211.0 15.00 196.00 (df = 6) 


A study of changes in the present sample 
beyond the first year after lobotomy would 
be extremely valuable and was, indeed, 
planned by the present investigators. How- 
ever, it is not being carried out since the cur- 
rent use of chemotherapeutic techniques with 
most of our operated and control patients 
might mask any differences which might arise 
from lobotomy as such. 


Summary 


A second study of the changes in psycho- 
logical tests during the first year following 
prefrontal lobotomy has been carried out with 
a sample of 15 operated patients, all of whom 
were individually tested prior to, and two 
weeks, three months, and one year following 
prefrontal lobotomy. Ten equated control pa- 
tients were individually tested at comparable 
intervals. 

The present sample differed considerably 
from that used in the first study with respect 
to length of hospitalization, motivation, and 
type of operation. In addition, all patients 


were individually tested in the second experi- 
ment rather than being tested in a group. 

The psychological test battery consisted of 
27 tests from which 106 measures were de- 
rived. Two hundred and eleven specific pre- 
dictions were derived from the results of an 
earlier study (12) as to the direction of net 
change in the operated group for each of six 
time intervals: (a) pretest—two weeks post- 
test, (b) pretest—three months posttest, (c) 
pretest—one year posttest, (d) two weeks 
posttest—three months posttest, (¢) two 
weeks posttest—one year posttest, and (f) 
three months posttest—one year posttest. 
Each prediction was tested by ascertaining 
the statistical significance of the net change 
in means, or, in the case of dichotomized data, 
the net shift in proportion of the individuals 
possessing the attribute. One-tailed tests were 
used throughout the analysis with the level 
of acceptance being set in advance at the 
+p = .05 level. 

The conclusions of the study were as fol- 
lows: (a) only 15 out of 211 specific predic- 
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tions were accepted (4p = .05), an outcome 
not significantly different from chance. (5) 
These results suggest that the hypothesized 
effects of lobotomy, as found in the first ex- 
periment, are not so far-reaching as to appear 
despite the variations in experimental pro- 
cedure. (c) While a further study of changes 
beyond the first postoperative year might 
have been extremely valuable, it is not being 
carried out, since it would yield relatively 
little information as to the effect of lobotomy 
in the present sample because of the increas- 
ingly routine use of chemotherapeutic pro- 
cedures with both lobotomized and non- 
lobotomized patients. 


Received September 30, 1955. 
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A Note on Research Methodology Used in Testing 
for Examiner Influence in Clinical Test Performance 


Leon H. Levy 


Indiana University 


This note presents a brief methodological 
critique of research concerned with the prob- 
lem of how much of the variance in clinical 
test performance may be accounted for by 
the personality characteristics of the ex- 
aminer (£2). 

The extent of actual research concerned 
directly with this problem is quite limited 
and with the exception of Lord’s study (2), 
each of these makes use of a simple-random- 
ized design (1). This design is characterized 
by the random assignment of subjects (Ss) 
all drawn at random from the same parent 
population to various independently adminis- 
tered treatments (in this instance, Es). This 
is usually accomplished in one of two ways: 
(a) Ss are randomly assigned to each of sev- 
eral Es, (6) a random sample of cases al- 
ready tested by each of several Es is pulled 
from case files. In this latter case it is as- 
sumed that there was no bias operating in 
the initial assignment of cases to each of the 
Es. In order to test the hypothesis of Z in- 
fluence on test performance, differences in 
mean scores between groups tested by vari- 
ous Es are then tested for significance. Any 
significant differences found are taken as evi- 
dence of E influence on test performance. 

The limitations of the simple-randomized 
design may be summarized briefly as follows: 
Whatever control it accomplishes over sam- 
pling error is entirely contingent upon the 
randomization procedure used in initial case 
selection and assignment. Where case file 
data are used it is doubtful whether adequate 
randomization ever obtains; where Ss are as- 
signed at random the possibility of intersub- 
ject differences still exists since randomiza- 
tion at best reduces the likelihood of bias but 


does not completely control it. A further 
limitation of this design is its failure to take 
interaction effects into account. Since it would 
seem highly unlikely that these do not exist 
in this area, data thus far accumulated can- 
not be adequately interpreted until they are 
known. 

If case file data are to be used and one is 
willing to forego investigation and control of 
interaction effects, a paired-replicates design 
or treatments-by-levels design may be used. 
This would obviate to some extent the neces- 
sity of assuming initial random case assign- 
ment. The important requirement here would 
be that Ss in each of the Z groups be matched 
on psychologically meaningful variables. 

Where the experimenter has complete con- 
trol over S selection and assignment, in addi- 
tion to the paired-replicates design he might 
also use a random replications design (1). 
This design further reduces the likelihood of 
sampling error and permits some control over 
interaction effects through randomization. 

Finally, although several other experimen- 
tal designs could be used more legitimately 
than the simple-randomized design, it would 
seem that because of the likelihood and im- 
portance of interaction effects and in terms 
of the amount of information yielded, any de- 
sign other than some type of factorial design 
must be considered profligate of time, effort, 
and data. 
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This study attempted a simultaneous in- 
vestigation of two factors in test administra- 
tion, and their interaction. The factors studied 
were oral vs. written administration of the 
TAT, and the effects of the presence or ab- 
sence of an examiner under each of these 
methods of administration. 

In the two studies which have previously 
investigated oral vs. written administration 
of the TAT (3, 8), the oral condition was 
administered individually, and the written 
condition was administered in group form. 
For example, Eron and Ritter (3) adminis- 
tered the written condition to groups of six, 
with two subjects sharing one set of TAT 
cards. In addition to the unknown effects of 
the group situation, we have the further con- 
taminating factor of the interaction of the 
two subjects using the same set of cards. 
Furthermore, the individual oral administra- 
tion was according to standard directions, 
with no time limit, while the written group 
administration imposed a time limit of five 
minutes per card. It must be pointed out, 
however, that the conditions of Eron and 
Ritter’s study were necessary since they were 
interested in determining whether the group 
method could economically be used in gather- 
ing normative data. However, the two forms 
of administration are not strictly comparable 
for determining the relative usefulness of 
written vs. oral stories in a clinical situation. 
It would appear, then, that there remains a 
need for comparing oral vs. written TAT 
stories, both administered individually. 


1 The author is indebted to Dr. Richard H. Dana 
for his assistance in the collection of the data. Thanks 
are also due to the following persons who served as 
raters: Paul R. Binner, A. Frank Knotts, Mrs. Nancy 
M. Robinson, and Robert A. Spicer. 


As to the need for investigating the effect 
of the presence or absence of the examiner on 
test protocols, Sells (7) has reported that 
Rorschach responses appeared quite unin- 
hibited in the group test (with relative ab- 
sence of the examiner), but controlled in the 
individual administration. He goes on to state 
that “. . . one important hypothesis requir- 
ing careful study is that the presence of an 
examiner in the clinical test may inhibit the 
reporting of highly emotional content” (7, 
p. 25). 

In this study, then, by using a 2 X 2 fac- 
torial design (1), we were able to investigate 
these two factors, as well as their interaction, 
which potentially could be of even more sig- 
nificance than the individual factors them- 
selves. 


Method 


The subjects were 67 female college stu- 
dents—the entire sophomore class of the Uni- 
versity of Colorado School of Nursing. The 
subjects were randomized into four groups: 
(a) Oral, examiner absent; (5) Oral, ex- 
aminer present; (c) Written, examiner ab- 
sent; (d) Written, examiner present. There 
were two examiners—one was the subjects’ 
psychology instructor, the other completely 
unknown to any of the subjects prior to the 
experiment.’ 

Subjects in all conditions were shown into 
a private office and were told they would find 
directions for what they were to do and the 


2 The small number of examiners in this study 
does not fulfill the requirements of representative de- 
sign, as suggested by Hammond (5), and the con- 
clusions will therefore be limited to the two ex- 
aminers in this study until the number of examiners 
has been increased. 
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necessary materials on the desk. On each desk 
was a set of TAT cards turned face down 
with card one on top. The cards were the 
third revision, and included all twenty pic- 
tures recommended by Murray (6) for use 
with adult females. The following directions 
for all subjects were typewritten on individual 
cards: 


This is a task which you should find interesting. 
On the table before you is a set of pictures which 
you are to look at one at a time, and make up as 
dramatic a story as you can for each one. Tell what 
has led up to the event shown in the picture, de- 
scribe what is happening at the moment, and then 
give the outcome. Since this is a test of how good 
your imagination can be, try to tell as much as pos- 
sible about what the characters are feeling and 
thinking. 

When you get to Card No. 16, you will find it is 
a blank card. Imagine a picture for this card, and 
then go ahead and give a story about this imagined 
picture. 

Be sure to number your stories to correspond to 
the numbers on the cards. (Be sure to dictate the 
number of the card before dictating your story.) 

No further instructions will be given. 


All subjects who were to be assigned to the 
oral conditions were given group instruction 
on the use of the dictaphone prior to being 
shown into the office in which they were to 
work. Each desk was equipped with either 
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a pad of paper and pencils or a dictating 
machine, depending on the experimental con- 
dition. The only difference between the ex- 
aminer-present and examiner-absent condi- 
tions was that in the examiner-present 
situation, one of the examiners sat across 
the desk from the subject throughout the 
session. 

Each story was coded and typed on a 
duplicating stencil so that comparable sets of 
stories could be presented to the raters. 
Stories were rated on Eron, Terry, and Cal- 
lahan’s scale for emotional tone (4); on 
Eron’s scale for outcomes (2); and on Terry’s 
scale for level of response (9). Only one vari- 
able was rated at a time and all the stories 
for a given picture were rated together. All 
stories were independently rated by two 
judges. Pearson r reliability coefficients were 
.88 for emotional tone, .82 for outcomes, and 
.83 for level of response. 


Results 


Table 1 shows the differences in the means 
for the three scales under each of the four 
conditions of administration. For each scale, 
the examiner-absent condition yielded a sig- 
nificantly higher mean than the examiner- 
present condition, whether the stories were 


Table 1 


Differences in Means for All Scales 

















Oral, Oral, Written, Written, 
Examiner Examiner Examiner Examiner 
Absent Present Absent Present Mean 
Variable (N=17) (N=16) (N =18) (N =16) Difference t 
Emotional Tone — 17.06 —11.13 5.93 3.01** 
— 17.06 —17.33 27 15 
—11.13 — 10.25 88 49 
— 17.33 — 10.25 7.08 4.19*** 
Outcomes —8.71 +3.69 12.40 3aa° > 
—8.71 —6.61 2.10 .67 
+3.69 +3.69 .00 .00 
—6.61 +3.69 10.30 4.68*** 
Level of Response 68.47 58.50 9.97 2.98** 
68.47 63.61 4.86 1.83 
58.50 60.00 1.50 50 
63.61 60.00 3.61 1.60 





Note.—For each of the examiner-present conditions the scores for the two examiners were combined since there were no 


significant differences on any scale between the two examiners. 
** Significant at the .01 level of confidence. 
*** Significant at the .001 level of o b 
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Table 2 


Summary of Analysis of Variance 








Source of Mean 
Variable Variation df Square F 
Emotional Oral vs. Written 1 55 O17 
Tone Exam. abs. vs. 
Exam. pres. 1 708.99 21.193*** 
Interaction 1 6.30 .188 
Within groups 63 33.45 
Total 66 
Outcomes Oralvs. Written 1 14.55 .228 
Exam. abs. vs. 
Exam. pres. 1 2140.59 33.541*** 
Interaction 1 23.81 373 
Within groups 63 63.82 
Total 66 
Level of Oral vs. Written 1 51.53 761 
Response Exam. abs. vs. 
Exam. pres. 1 748.81 11.062** 
Interaction 1 177.67 2.625 


Within groups 63 67.69 


Total 66 











Note.—In each case the hypothesis of homogeneity of vari- 
ance is tenable. 

** Significant at the .01 level of confidence. 

*** Significant at the .001 level of confidence. 


oral or written (with the exception of the 
written stories for level of response). There 
were no significant differences between oral 
and written stories for any of the scales, 
whether the examiner was present or absent. 

Table 2 shows the results of the analysis 
of variance for each scale. In each case, the 
only significant F was for the examiner-ab- 
sent vs. examiner-present condition. There 
was no significant interaction between the 
two testing factors studied. 


Discussion 


It has been demonstrated, within the limits 
of this experiment, that there is no appreci- 
able difference between TAT stories given 
orally or written by the subjects in this ex- 
periment, as far as emotional tone, outcomes, 
and level of response are concerned. How- 
ever, with the examiner absent the stories 
are sadder, have sadder outcomes, and show 
greater involvement on the part of the sub- 
ject. These findings clearly support Sells’ sug- 


gestion that the presence of an examiner may 
inhibit highly emotional content (7). It was 
noted by the examiners in the present study 
that subjects frequently looked up to see if 
the examiner was being attentive; and in the 
written condition, many subjects put their 
completed stories under the pad of paper, as 
if to assure that the examiner could not read 
them. 

However, the mere physical absence of the 
examiner may not be the important variable. 
It seems more likely that what may be op- 
erative is the subject’s expectancy for im- 
mediate evaluation. For example, Terry (8) 
reported a significantly higher level of re- 
sponse for individually administered TAT 
stories than for those given by written group- 
administration. Although the examiner was 
relatively absent from the group administra- 
tion, she noted that “After completion of the 
first story all subjects were reminded of the 
directions if they had omitted any part of the 
story, such as the ending, or what had hap- 
pened before to the characters” (8, p. 14). 
What may have happened here is that the 
subjects, after having had their first story 
evaluated by the examiner, expected that 
further evaluation might follow. On the other 
hand, this expectancy was apparently not cre- 
ated in the group administration of the Ror- 
schach reported by Sells (7). In the present 
study, the examiner did not read or hear any 
of the stories given in the examiner-absent 
condition until the subject had left the ex- 
amining room. In other words, it is tenta- 
tively proposed that the creation of an ex- 
pectancy for immediate evaluation, even if 
further evaluation does not in fact occur, 
changes the testing-field situation for the sub- 
ject from one of examiner-absent to examiner- 
present. It is planned to study this possibility 
in future experiments. 


Summary 


Sixty-seven female college sophomores were 
randomized into four groups for administra- 
tion of the TAT under the following condi- 
tions: oral, examiner absent; oral, examiner 
present; written, examiner absent; and writ- 
ten, examiner present. The resulting 1,340 
stories were rated for emotional tone, out- 
comes, and level of response. Analysis of 
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variance indicated that the only significant 
difference in the stories on each scale was a 
function of the presence or absence of the 
examiner, with the examiner-absent condition 
yielding higher means on each of the scales. 
There were no apparent differences between 
written and oral stories, nor was there any 
significant interaction between the two vari- 
ables studied. 

The data support the hypothesi, that the 
presence of an examiner in a test situation 
acts as an inhibiting factor for strongly emo- 
tional material. It was suggested that the 
variable operative may not have been the 
mere physical presence or absence of the ex- 
aminer, but rather an expectancy created in 
the subject for either immediate or more re- 
mote evaluation of his productions. 


Received October 28, 1955. 
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A number of recent studies have begun to 
emphasize the role of situational variables in 
psychological testing (1, 2, 3, 6). Such vari- 
ables include the examiner, the test instruc- 
tions and the set of the examinee, and the 
place of testing. Although these studies have 
effectively demonstrated the importance of 
such variables, they have not provided a 
basis for the regular or systematic prediction 
of the nature and amount of variation in per- 
formance due to such variables. 

Social learning theory provides ("+ hypoth- 
esis that one useful way to characterize situa- 
tional variables and their effect on behavior 
is in terms of the expectancies they arouse for 
reinforcement. The expectancy that a par- 
ticular reinforcement will follow from a par- 
ticular behavior is in part a function of the 
specific situation and the expectancy that a 
particular reinforcement will be followed by 
later specific reinforcements is also a function 
of the situation. More detailed descriptions 
of these relationships have been made by 
Rotter (4, 5). 

Simply stated, situations provide cues for 
expected reinforcements, both immediate and 
delayed. If behavior is directed toward maxi- 
mum gratification (or minimum punishment) 
we should be able to predict that behaviors 
characteristically directed toward particular 
reinforcements (or away from in the case of 
negative reinforcements) would be most likely 
to occur in situations which we can identify 
on the basis of previous experience as most 
likely to provide the reinforcements in ques- 
tion. 

Although this is a simple, common-sense 
proposition, it has not been put to work 


systematically in attempting to predict the 
effect of such variables as the personality or 
sex of the examiner or the time and place 
of testing for either clinical or experimental 
testing. Many failures to duplicate test re- 
sults both in clinical work and in experi- 
mental investigations have been ascribed to 
unreliability in the subjects or test instru- 
ments rather than to differences in the test- 
ing situation. 

The present study is an attempt to test the 
hypothesis that the preference values of a list 
of reinforcements will differ predictively de- 
pending upon the situation in which they 
are obtained when random groups of sub- 
jects drawn from the same population are 
used. 

More specifically, it was hypothesized that 
differences greater than chance would be 
found in the rankings of three kinds of re- 
wards—judged to be academic achievement 
rewards, athletic achievement rewards, and 
manual skill rewards—when obtained in an 
English class, a gymnasium, and a wood- 
working glass by the respective instructors in 
those classes. It was hypothesized that aca- 
demic achievement reinforcements would be 
rated by subjects to have higher value in the 
English class, athletic achievement in the 
gymnasium, and manual and mechanical skill 
achievement rewards in the woodworking 
class. This study was also undertaken to ex- 
plore one methodology for determining the 
significance of the psychological situation. 


Procedure 


It was important that the items provided 
for ranking should all be of approximately 
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equal value; this value being moderate. Re- 
inforcements of moderate value would be free 
to move either up or down in value as the sit- 
uation changed. Three lists of reinforcements 
(each containing 10 Athletic, 10 Academic, 
and 10 Manual reinforcements in random 
order) were constructed. Each reinforcement 
had previously been unanimously categorized 
as exclusively either Manual (M), Academic 
(Ac), or Athletic (At) by five judges. Three 
groups of male seventh and eighth graders 
(comprising a total of 99 subjects), were 
asked to categorize each item under one of 
five headings: (1) Like hardly at all; (2) 
Like some; (3) Like pretty well; (4) Like a 
lot; and (5) Like very much. The subjects 
were thereby to assign a value to each rein- 
forcement. Since the expectancy of the oc- 
currence of a reinforcement is a determiner of 
the value of that reinforcement, this ex- 
pectancy was held constant for each reinforce- 
ment by this statement prefacing each list: 
“Tmagine that you could do, become, or have 
all the things in the sentences below. . . .” 

The data from the administration of these 
lists were analyzed as follows: for each item, 
the number of subjects placing it under each 
heading was tabulated; each heading was as- 
signed a numerical value of from 1 to 5— 
thus, “Like hardly at all” received a value of 
1 and “Like very much” received a value of 
5; a value for each item was obtained by 
multiplying the number of cases under each 
heading by the value of that heading; then, 
for each item, the five values were summed 
and divided by the number of subjects who 
dealt with that item. 

From these data it was possible to get three 
matched groups of At, Ac, and M reinforce- 
ment items, each group having six items and 
the following mean values: Athletic 3.06; 
Academic 3.04; Manual 3.05. These figures 
satisfy the criterion of moderate value. 

These 18 items were then randomly ar- 
ranged into the list to be used in the final 
research, and contained such representative 
items as, “Be praised by the teacher for writ- 
ing a good book report,” “Have your friends 
admire you for getting good grades,” “Win a 
wrestling match with a friend,” “Know more 
about baseball than the other guys,” “Get a 
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tool set for Christmas,” “Win a prize for 
building the best magazine stand in school.” 

Subjects for the final phase of the experi- 
ment were 78 eighth-grade and 72 seventh- 
grade boys, all from the same school. Each of 
the three situations, English classes (Aca- 
demic), gym classes (Athletic), and wood- 
working classes (Manual), were required 
classes for all male students of the school. 
Fifty different subjects ranked the list of re- 
inforcement items in each situation. In all 
three situations, the ranking procedure was 
not initiated until the classes had been under- 
way for at least ten minutes, to help ensure 
the representativeness of the situations. In 
the English classes, the ranking was done 
with the subjects seated at their desks; in the 
gym classes the subjects were clad in gym 
clothes and seated on bleachers; in the wood- 
working classes the subjects did the ranking 
at their benches. In every case, the particular 
class instructor administered the list. Since 
all were required courses it was felt that the 
samples of subjects in each situation were 
large enough to ensure random selection of 
seventh and eighth graders for this school 
population. 


Results 


The mean ranks for the three groups of 
reinforcement in each situation are shown in 
Table 1. The lower the score the higher the 
preference. 

Reliability coefficients for the list of rein- 
forcements were computed by the odd-even 
method and corrected by the Brown-Spear- 
man formula. They were .79 for Athletic re- 
inforcements in the Athletic situation; .91 
for Academic reinforcements in the Academic 
situation; .83 for Manual reinforcements in 
the Manual situation. 


Table 1 


Comparison of Mean Ranks of Reinforcement 
Items in Three Situations 














Situation Mean of 

3 situa- 
Items Academic Athletic Manual tions 
Academic 9.54 10.21 9.26 9.67 
Athletic 9.83 8.40 9.90 9.38 
Manual 9.12 9.82 9.28 9.41 
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Table 2 


Significance of Differences of Means of Pairs of 
Reinforcements for Two Situations 








Situation and 





reinforcements Difference ?* 
Academic vs. Athletic 2.10 <.001 
Athletic vs. Manual 2.04 <.001 
Manual vs. Academic 44 >.10 





_* > values based on a one-tailed test of significance, since 
direction as well as change was predicted. 


Since the differences between mean rank- 
ings were slight, difference scores were used 
to test the hypothesis by calculating the sig- 
nificance of (a) the difference between mean 
ranks of the Academic and Athletic rein- 
forcements in the Academic situation as com- 
pared with that difference in the Athletic 
situation; (5) the difference between mean 
ranks of the Manual and Athletic reinforce- 
ments in the Manual situation as compared 
with that difference in the Athletic situation; 
and (c) the difference between mean ranks 
of the Manual and Academic reinforcements 
in the Manual situation as compared with 
that difference in the Academic situation. The 
data are shown in Table 2. 

It will be seen that significant results were 
found in the case of (a), differences between 
the Athletic and Academic reinforcements in 
these two situations and (6), differences be- 
tween the Athletic and Manual reinforce- 
ments in these two situations. No significant 
differences, however, were obtained between 
the rankings of (c), Academic and Manual 
reinforcements in these two situations. 

This latter failure to demonstrate differ- 
ences may be partially explained, it is felt, on 
the basis of the greater similarity of these 
two situations (M and Ac) as compared with 
the Athletic situation, which contained rather 
distinctive cues. An additional explanation 
after a post hoc analysis of the Manual items, 
such as, “Have your dad let you tinker 
around with his car,” and “Work on your 
bicycle after school,” showed that several of 
them had little in common with what went 
on in the woodworking classes and did not 
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as it was presumed, involve similar skills to 
those rewarded in a woodworking class. 


Discussion 


The results show that two of the three 
predicted differences occurred to a significant 
degree. Although the specific test responses 
studied may not be of great significance in 
themselves, the study suggests that such rela- 
tively unimportant variables as the physical 
setting of the testing and the subjects’ pre- 
sumed characterization of the examiner (Eng- 
lish teacher, gym teacher, etc.) may generally 
have demonstrable effects on test results. It 
seems likely that similar effects may occur if 
the subject is a hospital patient and sees the 
examiner as some one in authority or not, or 
if in a typical experiment the subjects pre- 
sume that the experimenter is an instructor 
or a graduate student, or if they anticipate 
that the examiner is one who can provide 
them with help, or appears to appreciate 
social skills more than intellectual ones, etc. 
In other words, the present study lends some 
support to the hypothesis that in both clinical 
and experimental testing, the group or in- 
dividual subject’s behavior must be appraised 
in light of the reinforcements they expect to 
be more likely to occur in the particular situ- 
ation provided by the physical setting and the 
examiner, as well as by the test instructions. 
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Ernest S. Barratt 
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Since its clinical inception in 1949, the 
Wechsler Intelligence Scale for Children 
(WISC) (5) has become one of the leading 
tests for measuring the general intellectual 
level of children. Two other scales, the Pro- 
gressive Matrices (PM) (4) and the Colum- 
bia Mental Maturity Scale (CMMS) (1), 
are becoming increasingly popular measuring 
instruments of the general reasoning ability 
of children. 


The CMMS was designed to provide “an estimate 
of intellectual ability in the age range 3 to 12 years” 
(1, p. 1). The CMMS is especially useful for test- 
ing children with speech difficulties and other handi- 
caps. 

The PM items appear to measure factorially com- 
plex spatial ability and reasoning by analogy. Al- 
though the author of the PM does not describe it 
as a test of g (after Spearman), the manual reports 
that it had a g saturation in one study of .82 (4, p. 
2). In a recent normative study of the PM (1947 
revision), the authors concluded that the PM should 
be considered “as a test of fairly complex intellectual 
reasoning processes” (2, p. 142) rather than a test 
of nonverbal reasoning ability. 


Since the entire WISC cannot always be 
administered because of time limitations or 
physical handicaps of the subject, it would be 
helpful to know to what extent the PM or 
CMMS could be substituted for it. The pur- 
pose of the present study is to note the rela- 
tionship of the WISC to the CMMS and PM. 


Procedure 


Seventy children, the entire fourth-grade 
enrollment in the Medill Elementary School, 
Newark, Delaware, were given the WISC, 
PM (1938), and CMMS as individually ad- 
ministered tests. The directions in the re- 
spective manuals for administering the tests 


were followed, with one exception, i.e., in ad- 
ministering the PM, the test was stopped 
when a subject missed six consecutive items. 
The age range of the children was from 9.2 
years to 10.1 years. The children came from 
both rural and urban homes. There were 26 
boys and 34 girls in the study. The 1938 edi- 
tion of the PM was used since it has been 
the author’s experience that in testing children 
of 8 years and older, sufficient variance is ob- 
tained with the 1938 edition to obviate sub- 
stituting the 1947 revision of the PM. 


Results and Discussion 


Table 1 contains the means and standard 
deviations (SD) for the various tests used in 
this study. The WISC means and SDs are in 


Table 1 


Means and Standard Deviations of the 
WISC, PM, and CMMS 











Test Mean SD 
WISC* 
Total 108.92 18.02 
Verbal 55.14 11.11 
Performance 53.78 8.68 
Information 11.04 2.98 
Comprehension 10.30 3.68 
Arithmetic 12.30 2.66 
Similarities 11.59 3.13 
Vocabulary 10.81 3.78 
Digit Span 10.17 2.36 
Picture Completion 10.63 3.31 
Picture Arrangement 11.04 3.14 
Block Design 10.82 3.44 
Object Assembly 10.94 2.46 
Coding 10.69 2.68 
Mazes 10.24 2.60 
PM 24.11 7.33 
CMMS 78.37 8.18 





* All WISC scores are Scaled Score Units (5). 
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scaled score units to facilitate comparing the 
present sample with the original standardiza- 
tion sample (mean = 10, SD = 3 for WISC 
standardization subtests). Although the cur- 
rent WISC subtest means are slightly above 
average compared to the original WISC stand- 
ardization sample, none of the deviations are 
significantly different from 10. The ¢ ratio for 
the largest deviation from 10 (arithmetic, 
12.30) is approximately 1.1. The WISC sub- 
test SDs range from 2.36 (digit span) to 3.68 
(comprehension), with the average of the 
SDs being 3.02. The subjects did slightly 
better on the verbal scales than on the per- 
formance scales but were less variable on the 
performance scales. 

The PM mean of 24.11 corresponds to a 
chronological age (CA) of 9.5 years in the 
standardization sample (4, p. 12). The PM 
range in the current study is 3 to 43; the SD 
is 7.33. Using the 1947 Colored PM, Green 
and Ewert (2, p. 140) obtained a range of 9 
to 35 and Martin and Wiechers (3, p. 144) 
obtained a range of 8 to 35 with comparable 
age groups. It is this author’s opinion that 
the 1938 PM provides a better differentiation 
of ability at the extremes than does the 1947 
Revision of the PM with children above a CA 
of 8 years. 

The CMMS mean in this study corre- 
sponds to a mental age of 12 years, and an 
IQ of 144 in the standardization sample (1, 
p. 8). Although the authors of the CMMS 
indicate that the IQs of the standardization 
group on the CMMS correspond to their 
Stanford-Binet IQs (1, p. 11), the CMMS 
has consistently yielded higher IQs than 
either the Binet or WISC in the University 
of Delaware Laboratory Clinic. 

Table 2 contains the uncorrected correla- 
tions of the PM and CMMS with the WISC. 
The PM was significantly related to the 
WISC total, verbal, and performance scores 
and to all of the WISC subtests except Com- 
prehension. Considering the rank order of 
the PM-WISC subtest rs, the PM correlates 
highest with those tests involving spatial rea- 
soning (block designs), verbal reasoning of a 
more or less abstract nature (similarities), 
and acquired knowledge (information and 
vocabulary). The PM has the lowest correla- 
tions with tests that involve reasoning in a 


Table 2 


The Relationship of the Wechsler Intelligence Scale 
for Children to the PM and CMMS 











WISC CMMS PM 

Total .606* .754* 
Verbal .559* .692* 
Performance A78* .699* 
Information .129 .585* 
Comprehension .330* 075 

Arithmetic .680* .540* 
Similarities A25* 589* 
Vocabulary 452* 561* 
Digit Span ata 419* 
Picture Completion 377* 415* 
Picture Arrangement A84* .300* 
Block Design A73* .601* 
Object Assembly .285** 388* 
Coding 442* .332* 
Mazes .642* .323* 
CMMS 576" 





* 1% level of confidence. 

** 5% level of confidence 
cultural context (comprehension, picture ar- 
rangement) and tests involving perceptual 
speed and psychomotor tasks (coding, mazes). 

The CMMS was also significantly related 
to the WISC total, verbal, and performance 
scores and to all of the WISC subtests except 
information. Ranking the CMMS-WISC sub- 
test rs, it is difficult to interpret some of the 
rs. The CMMS had a higher r with arithmetic 
than any other subtest; since numerical con- 
cepts (counting, relative length, relative size, 
etc.) are involved in many of the CMMS 
items, this is not surprising. It is difficult, 
however, to account for the high r between 
the CMMS and mazes. The author could 
think of no logical connection between the 
problem-solving behavior required for solving 
the CMMS and maze items that could not 
also apply to some of the other, lower sub- 
test CMMS rs. 

The PM correlated higher than the CMMS 
with the WISC total, performance, and verbal 
scores. About 57 per cent of the WISC total 
variance was in common with the PM, while 
only 36 per cent of the WISC total variance 
was in common with the CMMS. At the age 
level of the subjects in the current study, the 
PM appears to be a more appropriate substi- 
tution than the CMMS for the WISC total 
score. 








Summary and Conclusions 


The Columbia Mental Maturity Scale 
(CMMS) and the Progressive Matrices 
(1938) (PM) were related to the WISC. 
Seventy fourth-grade children were tested. 
Both the PM and the CMMS were signifi- 
cantly related to the WISC total, verbal, and 
performance scores. The PM had more vari- 
ance in common with the WISC total score 
than the CMMS. The relationship of the PM 
and CMMS to the WISC subtests was also 
discussed. 
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In an undergraduate course in which the 
writer was enrolled the question was once 
raised as to whether there is any relationship 
between intelligence and psychological adjust- 
ment. The professor replied that no one could 
actually answer the question because relevant 
studies had always had to measure intelli- 
gence of maladjusted persons after they had 
become ill enough to appear in a hospital or 
clinic. It was speculated at the time that the 
question might ultimately be answered as an 
outgrowth of some plan of mass mental meas- 
urement such as the testing of all draftees 
during World War II. A few years later, when 
the writer was a VA trainee in clinical psy- 
chology, this point was recalled and it was 
realized that he was in a favorable position 
to obtain pre-illness (or at any rate pre-acute- 
illness) intelligence scores of a large number 
of psychiatric patients. 

Since psychiatric patients can be consid- 
ered as persons who have failed to make ade- 
quate psychological adjustments, by examin- 
ing their Army General Classification Test 
scores (the AGCT is described in references 
1, 8, and 9) we would be able to make infer- 
ences about relationship between intelligence 
and psychological adjustment in the same 
manner that we might study the relationship 
between intelligence and scholarship by ex- 
amining the IQ scores of students who made 
F’s in courses. It would not be the whole 
story, but would be a beginning. Further, in- 


1 This study is based upon a dissertation sub- 
mitted by the writer in partial fulfillment of the re- 
quirements for the Ph.D. degree in psychology. The 
writer is indebted to Professor John M. Hadley for 
his critical and helpful service. 

2From the Veterans Administration Hospital, 
Marion, Indiana. 

3 Now at the Veterans Administration Hospital, 
Long Beach, California. 


formation about how persons who are now 
psychiatric patients performed on an intelli- 
gence test given before they became patients 
is important to the body of psychological 
knowledge. The performance of patients in 
different diagnostic groups would also be im- 
portant to ascertain. 

Investigation as to the practicality of the 
project disclosed that VA installations could 
obtain the AGCT scores of patients by send- 
ing a form to the Army. To fill out these 
forms for a large number of cases certainly 
posed a long and monotonous task, but it was 
decided to be worth the effort, and the study 
was undertaken. The project was carried out 
at the VA neuropsychiatric hospital at 
Marion, Indiana. 


Planning the Study 


In the literature pertinent to the question 
of relationship between intelligence and psy- 
chological adjustment, three general types of 
hypotheses were found to predominate. These 
were: (a) that there is a direct, positive re- 
lationship between intelligence and psycho- 
logical adjustment; (5) that a particular in- 
telligence level is optimal for adjustment in 
our society; (c) that patients in certain psy- 
chiatric diagnostic groups are brighter than 
those in others. These hypotheses led to the 
choice of chi square as the appropriate sta- 
tistic for the present study. 

If intelligence and adjustment are posi- 
tively related, comparing AGCT scores of 
mental patients to those of controls by chi 
square should show that this maladjusted 
group had scored lower than the normals. In 
addition, inspection of the chi-square table 
would reveal any range of intelligence with 
low representation among patients, in accord- 
ance with the hypothesis of an optimal range 
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of intelligence for adjustment. Each diagnostic 
group could be compared separately to con- 
trols by chi square to determine whether the 
group had scored significantly low or high on 
the AGCT. 

The literature provided further guidelines 
in the plan of our study. That careful atten- 
tion to design was necessary was shown by 
Lorge’s review of studies of intelligence and 
adjustment, published in 1940 (6). He found 
a range of correlation coefficients from minus 
49 to plus .70, with a median of .04. Of these 
findings he says that the range is so “extra- 
ordinary that anybody can make any state- 
ment.” Such diverse findings must come from 
differences in experimental design. Spragg 
(7) and Williams (11) have discussed typi- 
cal defects of design which apply to many 
relevant studies. Small sample and inade- 
quate controls are frequent limitations, and 
fortunately bountiful and adequate data were 
available for the present project. 


Collecting the Data 


Nearly ideal control data were available in 
Davenport’s distribution of AGCT scores of 
290,163 Fifth Service Command Army in- 
ductees, so geographical area could be held 
constant. However, one difficulty was en- 
countered. Davenport’s distribution is broken 
down into only five class intervals. To make 
our test as sensitive as possible to the occur- 
rence of an “optimal range,” a breakdown 
into five-point class intervals was desired.* 
To accomplish this, Davenport’s percentage 
values were further broken down according to 
values from a distribution of AGCT scores 
for the total Army.® Careful consideration 
was given to this procedure, which robs our 
design of elegance, but it was not judged to 
enter any practical bias. 

The experimental sample consisted of 510 
AGCT scores of World War II, male, Army 


4 Larger class intervals were used in the studies of 
diagnostic groups. These distributions were “tele- 
scoped” in order to maintain a minimum theoretical 
frequency of ten. 

5U. S. Army, The Adjutant General’s Office, Per- 
sonnel Research Section. Estimated percent distribu- 
tion of AGCT scores (Based on 9,757,583 AGCT 
scores by AGCT grade). A table prepared in July, 
1947, and obtained from the Adjutant General’s 
Office. 
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veterans, who were or had been hospitalized 
at VA, Marion, and diagnosed by a staff of 
two or more psychiatrists as suffering from 
functional psychiatric disorders. The sample 
was obtained by drawing randomly from the 
hospital’s register and rejecting cases which 
did not meet the criteria. 

In discussion of the project with colleagues 
the writer found a very pertinent question fre- 
quently raised. Suppose some of the patients’ 
AGCT scores were lowered by acute illness at 
the time of taking the test? A substudy de- 
signed to check this hypothesis was made. By 
analysis of variance it tested the assumption 
that subjects who had been hospitalized for 
schizophrenia shortly after taking the test 
would tend to have scored lower than schizo- 
phrenics hospitalized after a longer interven- 
ing period of duty. This investigation yielded 
no support for the hypothesis, so there was 
no further screening of the data. 


Results 


Figure 1 shows graphically the distributions 
for the experimental sample of AGCT scores 
of 510 patients and the control distribution 
based on scores of 290,163 Fifth Service 
Command inductees. Table 1 presents the re- 
sults of comparing these two distributions by 
chi square as well as comparisons to the con- 
trol data of distributions of AGCT scores for 
various diagnostic subgroups. 

The patients were not found to have scored 
lower on the AGCT to an extent that is sta- 
tistically significant. Schizophrenics, as a total 
group, scored significantly low. However, this 
result is misleading, since study of schizo- 
phrenic types reveals that the paranoid and 
catatonic types did not score low. In view of 
the often encountered statement that it re- 
quires high intelligence to be paranoid it 
might be well to underline the fact that the 
paranoid schizophrenic group showed no 
tendency whatever toward higher scores than 
the control group. The manic-depressive 
group, though including only 25 cases, showed 
a Statistically significant high scoring trend. 


Discussion 


1. Two conclusions may be drawn from the 
comparison of the total sample of patients’ 
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MID-POINTS OF STANDARD SCORE CLASS INTERVALS 


Fig. 1. Distribution of AGCT standard scores for 510 psychiatric inpa- 
tients (solid) and control distribution based on data for 290,163 Fifth Serv- 


ice Command inductees (broken). 


scores to the control data: First, no signifi- 
cant relationship between intelligence and ad- 
justment is shown. One might remark that we 
have shown 86 chances in 100 that “mentally 
ill” persons tend to have scored lower than 
the general population. However, the facts 
that this sample represents an extreme of 
maladjustment and that most of its eccen- 
tricity is due to certain schizophrenic types 
suggest to the writer that, if a single con- 
tinuum, “maladjustment,” were extended into 
the total U. S. population, any relationship to 


be found between this and intelligence would 
be negligible. Second, it is concluded that no 
particular level of intelligence is revealed as 
optimal for psychological adjustment (no re- 
stricted range along the intelligence con- 
tinuum showed a low incidence in this sam- 
ple of scores of maladjusted persons). It 
should be stated that this latter point has no 
bearing on the hypothesis of Hollingworth 
(4). Her hypothesis involves intelligence lev- 
els above IQ 130, which this study could not 
adequately sample. Thus our findings relative 


Table 1 


Chi-square Results: Comparisons to Control Data of AGCT Data for All Patients 
and for Diagnostic Subgroups 














Group N df Chi square p Trend 
All patients 510 16 22.32 14 
All schizophrenics 368 16 32.60 02 Low scores 
Paranoid 66 3 3.10 39 
Catatonic 126 7 10.61 19 
Hebephrenic 122 7 17.55 02 Low scores 
Simple 33 2 12.46 .002 Low scores 
Other 21 1 4.77 05 Low scores 
Nonschizophrenics 188 12 7.78 .79 
Alcoholism 29 1 0.61 90 
Neurosis 79 5 1.93 .20 
Manic-depressive 25 1 17.65 001 High scores 
Character disorder 55 3 4.21 25 
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to the first two hypotheses taken from the 
literature are negative. 

With respect to the third hypothesis our 
findings are positive. From the investigation 
of intelligence of patients by diagnostic groups 
is drawn the conclusion that intelligence can 
be significantly related to diagnosis. Manic- 
depressive psychosis is associated with high 
AGCT score. Schizophrenia, other than para- 
noid and catatonic types, is associated with 
low AGCT score. This is in accord with the 
findings of the sociologic studies of Duval 
(2), Faris and Dunham (3), and Weiss (10), 
who found schizophrenia to have a higher 
incidence in lower socioeconomic groups, 
manic-depressive psychosis in upper socio- 
economic groups. 

Theoretical speculations to explain the 
findings for different diagnostic groups are 
interesting to make. However, this study was 
undertaken as an empirical rather than a 
theoretical one and provides little or nothing 
for evaluation of hypotheses the speculator 
might formulate. The writer’s favored hy- 
pothesis about the low scoring tendency of 
some schizophrenics is that it is associated 
with personality disorganization of long stand- 
ing, existing prior to acute illness and tend- 
ing to have rendered the persons “function- 
ally unintelligent.” One might hypothesize a 
sharp cleavage between the scores of schizo- 
phrenics divided into reactive vs. process 
groups as discussed by Kantor, Wallner, and 
Winder (5). The trend of paranoid and 
catatonic schizophrenics scoring normally with 
all other schizophrenic groups scoring lower 
might become more pronounced through a re- 
sorting of atypical cases were the dichotomy 
of these authors substituted for the Kraep- 
elinian typology. No speculations about the 
findings for manic-depressives are volunteered. 


Summary 


This study involved drawing a large sam- 
ple of male, World War II, Army veterans 
from the files of the Veterans Administration 
neuropsychiatric hospital at Marion, Indiana. 
Army General Classification Test scores were 
obtained from the Army for 510 cases, this 
test having been taken prior to the onset of 
acute illness. The distributions of scores for 
this total group and for various diagnostic 


Charles F. Mason 


subgroups were compared to a control dis- 
tribution based on AGCT data from 290,163 
Fifth Service Command Army inductees. The 
findings were: 

1. No definite relationship was found be- 
tween intelligence, as measured by AGCT, 
and maladjustment as the latter was repre- 
sented by the total sample. 

2. Within the range of intelligence sam- 
pled by this study no particular level of in- 
telligence appeared as optimal for psycho- 
logical adjustment. 

3. Relationships between intelligence, as 
measured by AGCT, and diagnostic classifi- 
cation were indicated: (a) Schizophrenics, 
other than paranoid and catatonic types, 
tended to have obtained low AGCT scores. 
(5) Manic-depressive psychotics tended to 
have obtained high AGCT scores. 
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A review of reports extending through ap- 
proximately 20 years, during which a large 
number of testing techniques have been ap- 
plied in efforts to uncover the personality 
correlates of alcohol addiction, must lead to 
the conclusion that results have been most 
meager. Surveys, such as that of Sutherland, 
Schroeder, and Tordella (7) dealing with re- 
searches reported prior to 1950, have gener- 
ally concluded that there is little objective 
evidence indicating that any particular per- 
sonality type is predisposed to alcoholism. A 
search of the literature since 1950 has indi- 
cated no marked change in the status of this 
field of inquiry. 


The Inductive Approach 


Noting the discouraging results obtained in 
the application of standard testing methods, 
some investigators have approached the prob- 
lem inductively; in such studies, the effort 
has been made to develop tests which would 
reliably identify alcohol addicts. The results 
of such tests, when properly validated, were 
expected to produce data the analysis of 
which might lead to a clearer picture of the 
psychic substrates underlying this addiction. 


Among the efforts in this direction in recent years, 
the work of Manson (2, 4) in producing two diag- 
nostic questionnaires yielded initial results of con- 
siderable interest. Manson’s aim was to clarify by 
psychometric differentiation specific emotional, atti- 
tudinal, and behavioral patterns in alcohol addicts 
which differed significantly from such patterns among 


1 Based upon a dissertation submitted in partial 
fulfillment of the requirements for the M.A. degree, 
Fordham University. The writer wishes to express 
his gratitude to William C. Bier, S.J., for his advice 
and helpful criticism. 

2Now at Teachers College, Columbia University. 


nonalcoholics. His Manson Evaluation (3) was an 
attempt to assess certain personality traits by ques- 
tions relating to habitual feelings, reactions to per- 
sons and situations, goals and aspirations, physical 
states, and emotional trends. In the Alcadd Test 
(5), on the other hand, questions were openly di- 
rected toward elucidation of drinking habits and of 
affective responses to drinking by the subject and 
by others. Both tests were validated on several hun- 
dred alcoholic and nonalcoholic subjects of both 
sexes and the author claimed high validity, reli- 
ability, and predictive power for the two instru- 
ments. Barillas (1) confirmed Manson’s results with 
the Alcadd Test only, administering this test to 200 
male subjects including alcohol addicts, social drink- 
ers and abstainers, and finding that it did reliably 
differentiate members of the three groups. 


The purpose of the present research (6) 
was the revalidation of both tests, making 
use of female subjects who met carefully de- 
fined and applied criteria. 


Method 


While this revalidation might have been 
attempted using only “alcoholic” and “non- 
alcoholic” classifications of subjects, it was 
apparent that results of possibly greater sig- 
nificance might be obtained if specific groups 
lying along a postulated continuum running 
from no use of alcohol to plainly excessive 
use thereof could be tested. A differentiation 
within the alcoholic group between active 
addicts and members of Alcoholics Anony- 
mous suggested itself, and it further appeared 
that there might be significant differences be- 
tween women who customarily use alcoholic 
beverages and those who rarely or never do 
so. Four test categories were set up: active 
alcoholics, AA members, social drinkers, and 
abstainers. Since the Manson Evaluation was 
designed to reflect personality correlates un- 
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derlying the use of alcohol, it was expected 
that the group mean for active alcoholics 
would be highest (most deviant), followed 
by those for AA members, social drinkers, 
and abstainers in that order. In the case of 
the Alcadd Test, it was decided that AA 
members would be asked to respond on the 
basis of habits antedating entrance into Alco- 
holics Anonymous, since it seemed of little 
value to obtain a drinking habits report from 
a group of professed abstainers. In this case, 
it was expected that means for both alcoholic 
groups would be high, with those for the other 
groups progressively smaller. 


Criteria applied in the selection of test subjects 
were carefully worked out so as to differentiate dis- 
tinct samples. In the case of the active alcoholics, 
the basic requirement was that the subject be un- 
der treatment in a hospital, sanitarium, or clinic spe- 
cifically for alcoholism. Added conditions were that 
the subject be diagnosed psychiatrically as nonpsy- 
chotic and that last previous consumption of alcohol 
had occurred 5 to 14 days prior to testing. The 
time period was selected so as to minimize the ef- 
fects both of recent overindulgence and of therapy. 
Since psychiatric diagnosis was less probable in the 
other three groups, a practical criterion included for 
members of all three was “without obvious mental 
or physical defect.” In addition, AA members were 
required to have established a record of six months 
uninterrupted sobriety in that organization. Social 
drinkers were defined as women whose self-imputed 
use of alcoholic beverages ranged from “daily” to 
any frequency greater than once each month. In 
setting up the criterion for abstainers it was recog- 
nized that this group may include—besides absolute 
abstainers—persons who may on rare occasions use 
alcohol more as a courteous gesture than for any 
other reason. This criterion therefore included women 
whose use of alcohol ranged from “never” to a maxi- 
mum of once per month. 

A total of 132 subjects was tested. Following the 
discard of some records to make all four groups 
comparable as to age and length of education, 120 
subjects made up the test group, including 24 active 
alcoholics, 34 AA members, 34 social drinkers, and 
28 abstainers. Mean age for the entire group was 
41.63 years and mean length of education was 13.16 
years; applying F tests, comparability of groups in 
these two dimensions was established. 

The test instrument consisted of copies of each of 
the two tests appended to a personal data sheet 
which asked information as to age, occupation, 
length of education, marital status, and drinking 
habits. AA members only were asked for details as 
to their drinking patterns and length of sobriety. 
In the case of tests given to the latter group, the 
Alcadd Test bore the notation “AA members are 
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asked to answer this questionnaire on the basis of 
habits before entering AA.” 

Most subjects either lived in, or were hospitalized 
in, southwestern Connecticut; a few subjects were 
resident in New York City and a small number of 
active alcoholics were tested in a state alcoholism 
hospital in Hartford, Connecticut. Socioeconomically, 
the group tested appears to have been a middle- 
class one, there being no striking differences in this 
respect noted among the four subgroups. Length of 
continuous sobriety reported by AA members ranged 
from 6 to 176 months, with a mean length of 46 
months and a median of 36 months. 


Results 


Recalling that higher scores indicate greater 
deviation in the cases of both tests used, the 
results shown in Table 1 appear to confirm 
the hypotheses tested in this research. 

It will be noted that group mean scores on 
the Manson Evaluation make up a “ladder 
of scores” in the order: active alcoholics, AA 
members, social drinkers, abstainers. Of in- 
terest is the fact that scores for active alco- 
holics are much higher than those for AA 
members, the latter being closer to those for 
the social-drinker group. It would appear 
that, in terms of whatever dimensions are 
being measured by this test, AA members 
resemble social drinkers more than they re- 
semble active alcohol addicts. 

Results on the Alcadd Test also show such 
a “ladder” but with a reversal in that AA 
members obtained a higher mean score than 
did the active alcoholics. It will be recalled 
that AA members were asked to report their 
habits prior to their joining that group. No 
definite statement can be made—on present 
evidence—as to the reasons for this result; 
possible explanations which suggest them- 


Table 1 


Means and Standard Deviations of Scores for Four Test 
Groups, Using The Manson Evaluation 
and The Alcadd Test 














Manson Alcadd 
Evaluation Test 
Test Group N Mean SD Mean SD 
Active alcoholics 24 38.6 12.2 25.55 11.9 
AA members 34 23.4 12.3 414 9.7 
Social drinkers 34 155 7.4 65 45 
Abstainers 28 10.3 6.2 aa 











Diagnostic Tests for Alcohol Addiction 


Table 2 


The Manson Evaluation and The Alcadd Test: 
t Ratios of Differences Between 
Four Group Means 











Manson  Alcadd 

Comparison Evaluation Test 
Active alcoholics vs. AA members 4.61 5.45 
Active alcoholics vs. social drinkers 8.88 8.26 
Active alcoholics vs. abstainers 10.48 10.57 
AA members vs. social drinkers 3.16 18.57 
AA members vs. abstainers 5.05 21.13 
Social drinkers vs. abstainers 2.88 5.88 





Note.—For degrees of freedom involved in these ratios, value 
of t.. ranged from 2.65 to 2.68. 


selves include the following: (a) the higher 
AA scores may reflect a more accurate mem- 
ory of aberrant habits than was possible for 
those closer in time to excessive drinking, (5) 
there might have been a tendency by the 
AA’s to maximize difficulties considered as 
overcome, while the active addicts minimized 
recent patterns involving strong guilt feel- 
ings, and (c) the drinking patterns of the 
AA members in the test group may actually 
have been more deviant than those of the ac- 
tive alcoholics tested. 

Statistical significance of the differences be- 
tween group means on both tests is indicated 
by the ¢ ratios given in Table 2. 

It will be noted that all critical ratios are 
significant at well above the 1 per cent level. 
Discriminations tend to be more marked in 
results on the Alcadd Test, designed to dis- 
criminate among specific habit patterns, than 
those on the Manson Evaluation, which at- 
tempts measurement of less explicit person- 
ality dimensions. Despite the quantitative 
differences between results on the two tests, 
present findings appear to establish the va- 
lidity of both, at least with respect to these 
samples. 

Using Hoyt’s approximation of the Kuder- 
Richardson formula, reliability coefficients 
were established at .92 in the case of the 
Manson Evaluation and at .97 in the in- 
stance of the Alcadd Test. These coefficients 
are almost identical to those established by 
the test author, which were .94 and .96, re- 
spectively. It would thus appear that, tested 
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by the method of rational equivalence, both 
instruments are highly reliable. 

Pearson product-moment correlation coeffi- 
cients were derived from the scores of each 
group on both tests. Correlations were not 
significant except in the case of the active 
alcoholic group, for which the correlation 
amounted to + .50, significant at the 5 per 
cent level. It would appear that the two tests 
measure essentially dissimilar dimensions, the 
coincidence of which is explicit only in the 
case of the addicts tested. 

A further inquiry into the results obtained 
concerned the predictive power of each test 
In the present study, the effectiveness of these 
instruments is shown in Table 3. 

Using the cutoff scores suggested by the 
test author, it is apparent that both tests are 
quite effective in predicting status, with the 
single exception that in this case the Manson 
Evaluation did not effectively identify alco- 
holics who are AA members. As noted above, 
the subjects in this group had a high average 
length of sobriety (46 months) and their 
mean test score on the Manson Evaluation 
was closer to the mean of the social drinkers 
than to that of the active alcoholics. 


Discussion 


Results of this research appear to confirm 
the validity and reliability of both tests used, 
as applied to female samples. Predictive pow- 
ers established were in line with those claimed 
by the author, except that women AA mem- 
bers’ status was not determined by use of the 


Table 3 


Predictive Powers of The Manson Evaluation and The 
Alcadd Test: Number and Percentage of 
Subjects in Four Test Groups 

Correctly Identified 








Manson Alcadd 
Evaluation Test 

Test Group N No. & No. % 
Active alcoholics 24 20 80 20 80 
AA members 34 14 41 34 100 
Social drinkers 34 31 (91 32 94 
Abstainers 28 27 96 28 100 
Total 120 92 77 114 95 
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Manson Evaluation. Recalling that—in terms 
of the dimensions measured by that test— 
these AA members resembled the social drink- 
ers tested rather than the active alcoholics, 
two possible implications are suggested: (ca) 
that the Alcoholics Anonymous program of 
therapy has measurable effects, and (d) that 
alcoholics with certain personality configura- 
tions can benefit from the AA program while 
other alcoholics cannot do so. 

It would appear that these tests might be 
of use in a number of practical applications, 
such as military or industrial screening, diag- 
nostic differentiations, or aids to guidance 
counseling. At the same time, certain limita- 
tions of both instruments must be borne in 
mind. While the Manson Evaluation appears 
to be a valid and reliable predictor of a clini- 
cal syndrome, the etiological constituents of 
this syndrome are not presently identified. 
No evidence produced by this study over- 
comes the limitation stated by the test au- 
thor in his original report: “It was not estab- 
lished that this test could differentiate alco- 
holics from nonalcoholic clinical deviants” 
(4, p. 204). As to the Alcadd Test, it is ap- 
parent that the content—dealing primarily 
with the subject’s drinking habits—would 
limit its effective use in any situation which 
placed a premium on evasion. 

It is suggested that further large-scale use 
of these tests, especially use involving other 
deviant groups, might yield data which, sub- 
jected to item and factor analyses, could 
point the way to a more effective under- 
standing of alcohol addiction. 


Donal G. 
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Summary 


This report details results of research in 
which two questionnaires—the Manson Evalu- 
ation and the Alcadd Test—were used in test- 
ing 120 female subjects making up four test 
groups: active alcoholics, AA members, so- 
cial drinkers, and abstainers. While the Man- 
son Evaluation did not effectively identify 
the AA members in the test sample, all other 
results appear to confirm the clinical validity, 
reliability, and predictive power claimed by 
the author for both instruments. 


Received September 20, 1955. 
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A Comparison of the Bender-Gestalt Test and the 
Digit-Span Test as Measures of Recall’ 


Alexander Tolor * 
USAF Hospital, Parks Air Force Base, California 


This exploratory study was designed to in- 
vestigate the ability of patients with organic 
brain disease and patients with psychogenic 
disorders to recall digits and Bender-Gestalt 
Test designs. Clinical experience has led some 
psychologists to conclude that difficulty in re- 
peating digits forward or backward often re- 
flects an organic defect (2, 10). However, the 
retention of digits also seems to be highly 
vulnerable to the effects of anxiety which 
may operate to cause impaired concentration 
and attention (7, 10). It would seem, there- 
fore, that a disproportionately short memory 
span for digits could be ascribed to organic 
impairment or to impairment which is sec- 
ondary to an emotional disturbance. 


The Bender-Gestalt Test has only recently been 
used as a measure of memory for designs. The rela- 
tionship between the number of Bender figures re- 
tained and other psychological variables has not 
been subjected to extensive investigation as yet. 
However, there are a few pertinent studies in the 
literature. Hanvik and Anderson (6) in comparing 
two groups of subjects having brain lesions with a 
group of control subjects found no significant dif- 
ferences between any of the groups as to number of 
figures recalled. Gobetz (4) assumed that, in the ab- 
sence of an objective stimulus, emotional blocking 
would be greater for neurotic than for normal sub- 
jects and that the former would therefore recall 
fewer designs. This hypothesis failed to be supported 
by his data. 


There are a number of differences between 
digit and Bender recall some of which are re- 
lated to the nature of the task, the method 


1The opinions and conclusions expressed do not 
necessarily represent those of the Department of the 
Air Force. 

2The data were collected while the author was 
staff psychologist at Neurological Institute, Colum- 
bia-Presbyterian Medical Center, New York City. 


of administration, and the type of response 
required. One of the most important differ- 
ences pertains to the innocuous appearance of 
the Bender material and to the lesser social 
interaction between examiner and subject on 
the Bender test as compared to the Digit- 
Span test. The over-all effect is probably one 
of a more neutral emotional atmosphere dur- 
ing the presentation of Bender figures than 
during the presentation of digits. 


Procedure 


All patients were tested individually at the 
Neurological Institute of Columbia-Presby- 
terian Medical Center in accordance with 
the clinical procedures regularly employed by 
the staff of the Psychology Department. 


With the exception of the vocabulary subtest, the 
whole Wechsler-Bellevue Scale, Form 1, was ad- 
ministered routinely. The Bender-Gestalt Test was 
presented as suggested by Bender (3). Following the 
copying of the figures, the designs were removed 
from sight and each patient was requested to re- 
produce from memory on a new sheet of paper as 
many of the designs as he could. When the patient 
seemed to have completed the task he was also asked 
to reproduce any parts of the designs that he could 
recall. Only one such additional urging was given. 

Both whole and part figures were counted in com- 
puting the total number of Bender designs recalled. 
The figures which were reproduced from memory 
were also evaluated by the examiner for degree of 
distortion. 

The Digit-Span subtest of the Wechsler-Bellevue 
Scale provided the data on total number of digits 
retained and on the number of digits recalled for- 
ward and backward. The difference between the 
Digit-Span weighted score and the patient’s average 
Wechsler weighted score was also recorded to the 
nearest whole number. 

All patients were tested prior to the determina- 
tion of final diagnoses by members of the medical 
staff. The majority of the patients had been referred 
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Fig. 1. Comparison of average number of Bender 
figures recalled by organic, functional, and convul- 
sive patients at different levels of intelligence. 
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because their symptoms suggested the possibility of 
an organic brain disease. On the basis of their dis- 
charge diagnoses, the patients were separated into 
the following groups: an organic group, a seizure 
group, and a psychogenic group. The organic group 
contained 91 patients, all having some type of in- 
tracranial pathology. It included 23 patients with 
brain tumors, 16 with vascular pathologies, 4 with 
infectious diseases, 15 with head trauma, 27 with 
degenerative diseases, and 6 with hereditary-degen- 
erative disorders. These classifications are very broad 
and arbitrary and were made only for the purpose 
of conveying some idea of the type of cases that 
were represented. 

The psychogenic group contained 49 patients. Be- 
cause of the unreliability of specific diagnostic cate- 
gories no further nosological differentiation was at- 
tempted. However, it should be reported that only 3 
patients were diagnosed as being psychotic, the re- 
mainder suffering from less severe personality dis- 
orders. 

The 35 patients of the convulsive group were se- 
lected because their seizures, irrespective of whether 
they were petit mal, grand mal, or psychomotor, 
were of undetermined etiology. Thus, this popula- 
tion included only patients with idiopathic epilepsy. 


Results 


Table 1 contains information relative to the 
age, sex, and intelligence distribution of the 
three groups. There are obviously large age, 
sex, and intelligence differences among these 
groups. With respect to sex, only the pro- 
portion of males to females of the organic 
compared to the convulsive group differs sig- 
nificantly (.05 level). However, all the age 
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Table 1 
Composition of the Three Groups 














Group 
Organic Convulsive Psychogenic 

Item (W=91) (N =35) (N=49) 
Sex 

Male 57 14 23 

Female 34 21 26 
Age 

Range 12-72 12-56 12-64 

Mean 44.1 27.6 35.4 

SD 15.1 11.9 14.2 
W-B Total IQ 

Range 54-129 61-121 79-136 

Mean 90.8 94.2 110.8 

SD 16.7 15.4 13.1 





differences are significant at the .01 level. In 
intelligence, the differences in mean IQ be- 
tween the organic and psychogenic groups 
and between the convulsive and psychogenic 
groups are significant at the .01 level. The 
organic and convulsive groups do not differ 
significantly in intellectual functioning. 

A significant and substantial correlation 
exists between the two measures of memory 
as is indicated by the coefficient of correla- 
tion of .43 (co, = .06) between digit recall 
and Bender recall for all groups combined. 


Bender Recall 


No significant departure from homogeneity 
was obtained by Cochran’s test permitting the 
use of analysis-of-variance design. This tech- 
nique shows that the mean number of Bender 


Table 2 


Analysis of Covariance 


(Significance of differences among organic, psycho- 
genic, and convulsive groups in Bender recall: effect of 
intelligence partialled out by analysis of covariance.) 








Source of Variance 








Item Total Within Between 
Correlation 50 34 91 
df for r 173 171 1 
by value 053 033 — 
Adjusted Sum of Squares 468.407 365.199 103.208 
df 173 171 2 
Adjusted Mean Squares — 2.136 51.604 





Note.—F for aujusted Mean Squares = 24.16; p & .001. 
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figures recalled in the three groups differs to a 
significant degree (F = 44.38; p= < .001). 
Since the two groups of patients with intra- 
cranial abnormalities scored consistently lower 
in intelligence than those without brain dam- 
age, it was necessary to control for the effects 
of intelligence in determining whether the re- 
call of Bender figures is a function of clinical 
category. Table 2 shows that in adjusting 
for the effects of intelligence by analysis of 
covariance a high degree of significance is 
maintained. 

Figure 1 compares the average number of 
Bender designs recalled at different levels of 
intelligence for the three groups. It will be 
noted that the organic patients consistently 
recall fewer Bender designs than the psycho- 
genic patients and that the convulsive pa- 
tients generally occupy an intermediate po- 
sition irrespective of IQ range. 


Table 3 
Average Number of Bender Designs Recalled 











Group Uncorrected Corrected 
Organic 3.48 3.69 
Convulsive 5.06 5.15 
Psychogenic 5.98 5.53 





The average number of Bender designs re- 
called in each of the three groups prior to 
and following adjustment for the effects of 
intelligence are shown in Table 3. 

The relative difficulty in recall of the vari- 
ous Bender designs can be ascertained by in- 
spection of Fig. 2. A chi square computed by 
Wilcoxon’s (11) ranking method is 22.71, 
which yields a p of < .004, indicating that 
the difference between the relative frequencies 
with which the various designs are recalled is 
highly significant. Figure 2 also shows that 
there is a high degree of similarity among 
groups in the relative frequency of recall of 
specific Bender designs. For example, Con- 
figurations 4, 3, and A tend to be most diffi- 
cult for subjects in all groups whereas De- 
signs 8 and 6 tend to be easiest. 

When the three groups are compared for 
the number of patients who produce severe 
Bender design distortions and minor distor- 
tions on recall, highly significant differences 
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Table 4 
Frequency of Bender Distortions in 
the Three Groups 
Psycho- 
Organic Convulsive genic 
Distortion (N =88)* (NW =35) (N = 49) 
Severe 33 3 7 
Minor 55 32 42 





* Three patients failed to recall any Bender figures and were 
therefore omitted in this analysis. 


are found (y* = 14.76, p = < .001). Table 4 
contains data indicating that the relative fre- 
quency of major distortions on recall is far 
greater in the organic group than in the other 
groups. 


Digits Recalled 


Since Cochran’s test confirmed the assump- 
tion of equal variance, an analysis-of-vari- 
ance technique could be applied to the data 
on the total number of digits recalled by the 
subjects in each of the three groups. The F of 
20.17 yields a highly significant p of < .001. 
Table 5 illustrates that when the effects of 
intelligence are controlled for by analysis of 
covariance the adjusted means still differ sig- 
nificantly. 
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Fig. 2. Proportion of subjects in each of the groups 
recalling the different Bender designs. 








308 


Table 5 


Analysis of Covariance 


(Significance of differences among organic, convul- 
sive, and psychogenic groups in total number of digits 
recalled: effect of intelligence partialled out by analysis 
of covariance.) 








Source of Variance 








Item Total Within Between 
Correlation 70 63 94 
df for r 173 171 1 
b., value .108 101 _— 
Adjusted Sum of Squares 696.38 656.94 39.44 
df 173 171 2 
Adjusted Mean Squares — 3.84 19.72 





Note.—F for adjusted Mean Squares = 5.13;  S .01. 


When analysis-of-variance designs are ap- 
plied to data on the recall of digits forward 
and to the recall of digits backward, the fol- 
lowing values are obtained respectively: F 
= 16.11, pS.001 and F = 15.47, pS.001. 
In both sets of data when adjustments are 
made for effects of intelligence by analysis of 
covariance, significance drops to the .05 level. 
The average number of total digits recalled, 
and the number of digits retained forward 
and backward by the subjects in each of the 
three groups, are presented in Table 6. 

The final comparison among groups is 
based on averages of differences between each 
patient’s weighted Digit-Span subtest score 
and his Wechsler-Bellevue average weighted 
score. An analysis of variance for the three 
groups yields an F of 9.32, equivalent to p 
< .01. The means for the organic, convul- 
sive, and psychogenic groups are — .69, — .09, 


Table 6 
Average Number of Digits Recalled 











Psycho- 
Digit Recall Organic Convulsive genic 
Total 
Uncorrected 9.43 10.97 12.20 
Corrected 10.07 11.26 10.81 
Forward 
Uncorrected 5.66 6.37 6.98 
Corrected 5.94 6.49 6.37 
Backward 
Uncorrected 3.77 4.60 5.22 
Corrected 4.12 4.76 4.45 
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and — .37, respectively. The data show that 
the convulsive patient tends to do approxi- 
mately as well on the Digit-Span subtest as 
on the other Wechsler subtests. The psycho- 
genic patient tends to have some degree of 
impairment in rote memory for digits whereas 
the organic patient demonstrates greatest im- 
pairment relative to his functioning in other 
areas. The difference between the organic and 
convulsive groups is significant at better than 
the .01 level, the difference between the or- 
ganic and functional groups is significant at 
the .05 level, but the difference between the 
convulsive and functional groups is not sta- 
tistically significant. 


Discussion 


The results of this study indicate that there 
is a higher correlation between number of 
digits recalled and intelligence (r = .70) than 
between the number of Bender figures re- 
called and intelligence (ry = .50). When the 
digit-recall results are analyzed further, one 
finds a .60 correlation between intelligence and 
digits forward and a .65 correlation between 
intelligence and digits reversed. These corre- 
lations are somewhat higher than the r of .51 
between intelligence and the Digit-Span test 
reported by Wechsler (10). 

Although there is general agreement on the 
correlation between intelligence and number 
of digits recalled, there are some contradic- 
tory findings in the literature on the relation- 
ship between intelligence and Bender recall. 
Aaronson, Nelson, and Holt (1) obtained 
an r of .03 between MA on the Shipley- 
Hartford and Bender recall, and an r of 
— .10 between CQ and Bender recall. On the 
other hand, Peek and Olson (8) found a 
product-moment correlation of .34 between 
CQ and number of Bender designs recalled, 
and a correlation of .19 between the Shipley- 
Hartford and number of Bender figures re- 
tained. The present study would seem to in- 
dicate that a substantial relationship exists 
between intellectual functioning and Bender 
recall, one which is, however, less than the 
relationship between intellectual functioning 
and digit recall. 

The data indicate also that the Bender re- 
call is superior to digit recall as a measure of 
discrimination between organicity and non- 
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organicity. However, with neither test are the 
differences among groups sufficiently large to 
permit prediction in individual cases. 

The fact that the difference in recall be- 
tween the organic and functional groups is 
greater on the Bender test than on the Digit- 
Span test seems to support the impression 
that the Bender is less susceptible to the ef- 
fects of anxiety than the Digit-Span test. 

It is interesting that the results obtained 
on the relative difficulty in recall of various 
Bender designs differ somewhat from those 
reported by Goodstein et al. (5). In both 
studies Designs 3 and 4 were found most 
difficult, but the ease with which Figures 8 
and 6 were recalled by the patients in all of 
our groups was at variance with the results 
of the other study. Furthermore, the sub- 
jects used in the present study did not find 
the first three designs to be particularly easy. 
On the contrary, considerable difficulty was 
encountered in reproducing Design A. These 
differences could be attributed to differences 
in subjects used or to differences in experi- 
mental methodology. Inasmuch as there is 
some evidence (9) that the various designs 
differ in symbolic significance, it might be 
useful to undertake an investigation of the 
relationship between design difficulty and the 
emotional value which the designs have for 
subjects. 


Summary 


Patients having organic brain disease, idio- 
pathetic seizures, and psychogenic disorders 
were compared for their performance on the 
Digit-Span test and on the Bender-Gestalt 
Test. The results indicated that there are 
significant differences among groups on both 
measures of immediate recall. These differ- 
ences hold even when adjustments are made 
for the effects of intelligence. In Bender recall 
the psychogenic group does best, the organic 
group does least well, whereas the convulsive 
group occupies an intermediate position. On 
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the Digit-Span test the order of recall is con- 
vulsive group, psychogenic group, and organic 
group, with the latter retaining the fewest 
digits. The differences among groups are more 
pronounced on the Bender test than on the 
Digit-Span test. The level of difficulty of in- 
dividual Bender designs was also determined. 
Some of the results were discussed in terms 
of the differential susceptibility of the meas- 
ures of recall to the effects of anxiety. 


Received September 20, 1955. 
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The Relation of the MMPI to the Edwards Personal 
Preference Schedule on a College 
Counseiing Center Sample’ 
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The Minnesota Multiphasic Personality In- 
ventory (MMPI) is a well-known test instru- 
ment which is frequently used in counseling 
college students even though its normative 
data are based on a general adult population. 
Most of the scales of this test were con- 
structed empirically by selecting items which 
differentiated a diagnosed adult clinical sam- 
ple from a normal adult sample (10). The 
purpose of the research to be reported here 
is to compare the MMPI with a new per- 
sonality test standardized on a college popu- 
lation and designed to measure the relative 
strength of 15 personality needs or dimen- 
sions. 

The new test is the Edwards Personal 
Preference Schedule (PPS). It consists of 210 
forced-choice items. Each pair of items is 
matched approximately for mean social de- 
sirability value, as previously judged by col- 
lege students, to minimize the effect of social 
desirability on item choice. Items measuring 
one need are paired twice with each of the 
remaining 14 needs; hence the maximum raw 
score on any need is 28. Since the items are 
paired, the total raw score on the test will be 
the same for all persons. Thus the PPS re- 
flects only the relative strength of competing 
needs and attitudes rather than the absolute 
strength of any need. Hence this test clearly 
has the advantages as well as the disad- 
vantages of an ipsative scale. 

The needs measured by the PPS are those 
indicated in the manual (4) which gives Ed- 


1 Based, in part, on a paper presented at the con- 
vention of the American Psychological Association, 
San Francisco, California, 1955. 


wards’ description of the variables. The PPS 
sets out to measure the needs defined by 
Murray (14) and uses similar nomenclature 
in titling the needs. Test items were originally 
chosen for the scales on an a priori basis in 
consultation with other psychologists. 

To obtain an estimate of the degree of re- 
lationship present between these two instru- 
ments ‘ ‘rachoric correlations were computed. 
The sample of subjects was dichotomized as 
close to the median score of the group as 
possible. The most diverse of the dichotomies 
was one of 44 per cent of the sample vs. 56 
per cent of the sample. Gohsen’s method (5) 
of estimating tetrachoric rs was used. Since 
tetrachoric rs are less reliable than Pearson 
rs, Pearson rs were computed when the rela- 
tionship between two variables as reflected by 
the r, appeared very highly significant. The 
Pearson rs were somewhat lower than the 7s 
but the level of significance was not changed. 
Correlations were computed not only for the 
standard validating and clinical scales of the 
MMPI, but also for ten experimental scales 
derived from the MMPI. These were Gough’s 
Dominance, Responsibility, and Status scales 
(7, 8, 9), Drake’s Social-Introversion (3), 
Navran’s Dependence (15), Tayior’s Mani- 
fest Anxiety (16), Winne’s Neuroticism (17), 
Cook and Medley’s Pharasaic-Virtue and Hos- 
tility scales (2), and an unpublished Social 
Desirability scale constructed from MMPI 
items by Edwards (4). 

The sample consisted of all available pre- 
sumably valid records of males seen at a uni- 
versity counseling center for vocational or 
educational advisement who had taken both 
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the MMPI and PPS during the same coun- 
seling contact. Following Hathaway and 
Meehl’s suggestion (11), records with L = 10 
or F = 16, or OQ > 30 were eliminated (4 Ss). 
In addition, since F — K scores of 21 or more 
are most frequent in selection situations where 
persons are deliberately attempting to make 
good impressions (12), records where F — K 
was = 21 were eliminated (6 Ss). One other 
criterion for inclusion in the sample was used. 
On the PPS there are 15 item pairs which are 
exact duplicates; the number of agreements 
on these 15 items is used as a measure of 
the consistency of response. Edwards suggests 
that with a consistency score of 8 or less the 
scale scores are highly suspect. Six per cent 
of his male normative sample obtained con- 
sistency scores of 8 or less. To find a similar 
cutting point on the MMPI, a 10 per cent 
random sample of 1,500 male MMPI records 
were scored on the 16 duplicate items of the 
MMPI. For these 150 records 5 per cent of 
the Ss obtained consistency scores of 12 or 
less. Hence records were eliminated where 
the PPS consistency score was 8 or less or 
where the derived MMPI consistency score 
was 12 or less (12 Ss). Of the original 177 
available records, 155 remained. These in- 
cluded veterans and nonveterans, college 


(82.6 per cent) and noncollege clients. The 
most typical client was a single, 23-year-old 
college sophomore. 

Table 1 shows the means of the sample in 
the standard T scores of the usual MMPI 
scales and of the PPS. T score means are not 
shown for the experimental MMPI scales 
since such scores are not available on all of 
these scales and since, where they are avail- 
able, they are based on quite different popu- 
lations. Table 1 shows that persons who come 
to a counseling center for educational or vo- 
cational counseling differ considerably from 
the respective normative samples, which is to 
be expected. Also, as one would expect, devia- 
tions from the normative group are more 
marked on the MMPI where the normative 
population is a general adult sample than on 
the PPS where the normative population is 
a college sample. This finding is consistent 
with other studies (6) of the MMPI which 
have indicated that college students in gen- 
eral differ from the MMPI normative group. 
On the PPS the counseling client deviates 
most markedly from the normative college 
males on three scales: they rate themselves 
especially low on the heterosexuality scale, 
especialiy high on the achievement and en- 
durance scales. These differences are interest- 
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Client Score Distributions on the PPS and MMPI (N = 155) 


PPS MMPI Experimental MMPI 

Scale Mean o r tes Scale Mean o tne Scale Mean a 
Ach 16.95 4.05 53 Z, 3.34 1.95 50 Dom 16.87 3.55 
Def 11.92 3.36 52 F 4.70 2.97 55 Resp 21.21 3.53 
Ord 11.23 4.47 52 7 15.05 4.52 55 Status 22.58 3.59 
Exh 13.60 3.78 49 F—K -—10.38 6.30 — Si 27.04 0.14 
Aut 15.01 4.10 51 Hs 4.35 3.64 49 Dep 20.29 10.05 
Aff 14.13 4.24 48 Hs+K 12.12 3.47 52 Man. Anx. 15.73 7.98 
Int 16.22 4.99 50 D 20.22 5.18 58 Neur 5.50 4.01 
Suc 10.27 4.50 48 Hy 20.48 4.58 56 Pv 16.62 7.68 
Dom 17.05 5.04 49 Pd 17.02 5.13 58 Ho 17.62 7.68 
Aba 13.11 5.34 52 Pd+K 23.06 4.94 60 Soc. Des. 30.79 5.63 
Nur 13.36 4.68 48 Mf 27.08 5.20 63 
Chg 15.88 4.37 51 Pa 9.27 2.89 53 
End 13.74 5.77 53 Pt 13.48 8.10 54 
Het 15.45 5.84 45 Pi+K 28.58 5.94 62 
Agg 12.09 4.84 48 Se 12.34 7.88 53 

Sc+K 27.46 6.38 59 

Ma 17.15 3.88 57 


Ma+K _ 20.14 





* Ts are approximations. 
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ing and are developed in other research (13) 
but here our concern is with a comparison of 
the two tests. 

Table 2 presents the tetrachoric correla- 
tions between the MMPI and the PPS scales. 
Most of the correlations found are not sta- 
tistically significant. However, more signifi- 
cant correlations were obtained than one 
would expect by chance alone. In addition, 
using the technique suggested by Brozek and 
Tiede (1), the probabilities of the compound 
event of obtaining m significant critical ratios 
in a series of N tests of significance was com- 
puted. The critical ratio obtained was 16.33 
so that the number of significant relationships 
found is far beyond chance expectations. 

Where significant correlations exist, the di- 
rection of the relationship makes sense with 
clinical expectations developed from MMPI 
research. For example, Change correlates 
positively with Ma; Aggression correlates 
negatively with Z and K, positively with non- 
K-corrected Pd; Abasement correlates posi- 
tively with F, negatively with K, positively 
with non-K-corrected Pt and Sc. Three PPS 
scales are of particular interest in relation to 
the usual MMPI clinical scales. Abasement is 
most clearly related to the clinical scales. On 
the other hand, Deference and Dominance 
are inversely related to the clinical scales. 
Clients who rate their guilt and inadequacy 
feelings relatively high on the PPS are most 
likely to be abnormal as measured by the 
MMPI. Clients who have accepted either a 
dominant, strong role or a more conforming, 
follower role in interpersonal relationships on 
the PPS tend to be least like deviants as 
measured by the MMPI. Most of the PPS 
scales, however, show little relationship with 
the MMPI clinical scales. 

The effect of the K correction of the MMPI 
on the correlations is of interest. College coun- 
selors have sometimes felt that K-corrected 
scores underestimated the adjustment level 
of college students since, in general, they 
tend to obtain high K scores. To a degree 
this objection probably still holds when a 
correction developed from abnormal groups 
is applied to the responses of fairly normal 
groups; nonetheless, the K-corrected scores 
appear of value in using the MMPI with the 
latter groups since the K scale does tend to 
correct for test-taking attitudes. 


On the PPS the social desirability stereo- 
type of the items is partially controlled so 
that it is less easy to respond only in terms 
of culturally approved responses. The K cor- 
rection tends to have the same effect on the 
MMPI scales as indicated by the correla- 
tional data of this study. Of the nine ™=s sig- 
nificant at the 5 per cent level when MMPI 
scales are not K corrected, only one remains 
significant at the 5 per cent level when the K 
correction is applied; of the nine 7s signifi- 
cant at the 1 per cent level, two remain sig- 
nificant at the 1 per cent level when the K 
correction is applied, three drop to the 5 per 
cent level of significance, and the remaining 
four become nonsignificant. Thus a partial 
control of the biasing effects of the cultural 
stereotype of normal behavior on the MMPI 
clinical scales reduces the relationship be- 
tween the MMPI and the PPS. 

The results of the correlation of the PPS 
with the MMPI experimental scales are quite 
striking. The significant correlations are again 
generally consistent with clinical expectations 
but relatively low. The two Dominance and 
Hostility scales correlate positively but to a 
lesser degree than one might expect. Respon- 
sibility correlates negatively with Aggression, 
but the expected negative correlations with 
Autonomy and Change are too low to be sig- 
nificant. Attitudes characteristic of persons 
scoring high on the Status scale are directly 
related to Dominance and inversely related to 
Succorance and Abasement. These findings 
are consistent with our usual cultural patterns. 

Of general interest is the consistency of the 
direction of the correlations of the MMPI 
experimental scales with the PPS if the So- 
cial Desirability, Status, Responsibility, and 
Dominance scales are viewed as measuring 
socially approved or “healthy” behavior and 
the remaining scales are viewed as measuring 
socially disapproved or “sick” behavior. Of 
the PPS scales, Dominance, which is prob- 
ably most closely related to a normal MMPI 
profile, has positive relationships with the 
“healthy” MMPI experimental scales and 
negative correlations with the “sick” scales. 
Of the PPS scales, Abasement, which is most 
closely related to an abnormal MMPI profile, 
has the opposite relationship—negative rela- 
tionships with the “healthy” scales and posi- 
tive relationships with the “sick” scales. Ex- 
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hibition and Aggression show similar inverse 
patterns where the correlations reach a sig- 
nificant level. It is noteworthy that the at- 
tempt by the PPS to rule out the social de- 
sirability preference value of an item is not 
entirely successful; four of the correlations 
between the Social Desirability scale based on 
the MMPI and PPS variables are signifi- 
cantly above zero. 

The pool of Ss was too small to obtain sta- 
tistically reliable differences on MMPI scales 
between Ss with varying profiles on the PPS. 
However, some observations on such profiles 
appear pertinent. A pattern that occurs rather 
frequently is one with Deference and Endur- 
ance high and Autonomy and Aggression low 
or vice versa. Those with the high Aggres- 
sion-Autonomy pattern tend to give a much 
more deviant MMPI profile than those with 
a high Deference-Endurance profile. The Ag- 
gression-Autonomy group tend to be lower on 
the “healthy” experimental scales and higher 
on the “sick” experimental scales. Those with 
high Succorance scores are quite similar to 
those with high Abasement scores. It ap- 
peared that a high Succorance plus a high 
Heterosexuality score or, to a less extent, a 
high Abasement score plus a high Hetero- 
sexuality score, identified those in this group 
who scored especially high on the Depend- 
ence scale. Again the data suggest that a high 
Heterosexuality score on the PPS is related 
to lack of adjustment rather than to adjust- 
ment. There seems to be a need to deny con- 
forming, socially acceptable attitudes rather 
than a lack of anxiety about sexual impulses. 

In summary, the correlations between the 
scales of the PPS and the MMPI on a col- 
lege counseling center sample are not high. 
The relative ratings of a client’s needs as he 
views them on the PPS may or may not in- 
dicate the presence of personality symptom 
formation as defined by the MMPI. This 
finding is not surprising in view of the fact 
that the PPS is an ipsative scale which does 
not reflect the absolute level of need and the 
fact that psychiatric diagnoses, crucial to the 
construction of the MMPI, are often related 
more to the manner of satisfying need than 
to the nature of the needs themselves. How- 
ever, where relationships are found they tend 
to be consistent with MMPI research data. 
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Both tests will probably be found useful in 
college counseling centers. Just as the Allport 
Study of Values contributes something unique 
when used in combination with the Strong 
Vocational Interest Blank, so does the PPS 
make a contribution distinct from that of the 
MMPI. The PPS shows the relative weight a 
person gives to various personal needs and 
the MMPI the degree of response similarity 
to well-defined clinical groups. 


Received October 17, 1955. 
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A potentially fruitful and commonly ac- 
cepted clinical hypothesis is the formulation 
that a discrepancy between an individual’s 
conception of his real self and ideal self is 
indicative of maladjustment and that as psy- 
chotherapy proceeds the discrepancy between 
the two selves should decrease. The present 
investigation was undertaken to highlight the 
necessity for controlling an important vari- 
able in the valid testing of this hypothesis, 
namely, that of social desirability. 

In obtaining a discrepancy index between 
real self and ideal self two similar methodo- 
logical procedures have been commonly em- 
ployed. One method (2) consists in having 
an individual rate a number of dynamic per- 
sonality traits on, say, a 7-point rating scale 
under two instructional conditions. The per- 
son is asked first to rate his conception of his 
real self and secondly to rate his notion of 
his ideal self. The discrepancy measure is 
provided by the absolute sum of the differ- 
ences between the pairs of trait items on the 
real-self and ideal-self ratings. Another pro- 
cedure for obtaining an operational index of 
discrepancy has been through Q technique 
(7). In this method, the individual sorts a 
given group of statements or traits into suc- 
cessive intervals that follow a quasi-normal 
frequency distribution. That is, the person is 
instructed to place a specified number of 
statements into the different categories on a 
continuum from most descriptive to least de- 
scriptive. The individual performs two Q 
sorts, a real-self sort and an ideal-self sort. 
The correlation between these two Q sorts 
provides the index of discrepancy. The lower 

1 This article represents a modification of a paper 


presented at the American Psychological Association, 
San Francisco, 1955. 


the correlation is, presumably, the greater 
the maladjustment. 

Edwards and Horst (4) have hypothesized, 
on the basis of Edwards’ finding that social 
desirability correlates highly with the prob- 
ability of endorsement of personality ques- 
tionnaire items, that the social desirability 
factor should operate in Q sorts. That is, 
statements which are socially desirable should 
be judged by individuals as most character- 
istic of themselves. By an extension of their 
hypothesis, it may be postulated that social 
desirability would enter also as an obtrusive 
variable into the rating scale procedure to 
real-ideal self discrepancies. In fact, it is not 
unreasonable to suppose that some of the dis- 
crepant findings between various studies in 
this area may have occurred as a partial or 
direct result of the failure to control for this 
variable. For instance, while the studies of 
Bills (1) and Roberts (6) have offered some 
positive support for the hypothesis that a 
discrepancy between real self and ideal self 
is indicative of personality disturbance, Zim- 
mer’s (8) investigation yielded negative re- 
sults. However, the latter study would be 
open to criticism if it were shown that the 
personality traits employed were “loaded” 
with the social desirability variable. It is one 
of the purposes of this study to show that 
such was the case. 

A secondary purpose of the present investi- 
gation is to determine whether or not social 
desirability enters to the same extent into 
personality questionnaire items, rating scale 
items, and items in Q sorts. There was the 
initial expectancy that social desirability 
would play a differential role in the three 
personality-evaluation techniques. The ra- 
tionale for this expectancy was simply that 
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individuals would censor their self-descrip- 
tions to a greater extent in the more highly 
structured techniques than in the less struc- 
tured ones. The order of structure, from most 
to least, would seem to be from the question- 
naire items, to the rating scales, and then to 
the Q sorts. 


Method 


Social desirability ratings. Twenty-five self 
concepts, used by Zimmer (8), were scaled 
for their social desirability on a 7-point scale 
by 67 undergraduate university students (28 
females and 39 males). Zimmer’s criteria for 
these traits were that they must (a) occur 
between six to ten times in one million words, 
as judged by the Thorndike-Lorge word 
count; (6) reflect dynamic content; and (c) 
not represent “a cultural stereotype of a de- 
sirable or undesirable trait” (8, p. 447). No 
explicit reference was made as to how the 
latter two criteria were assessed. 

The instructions to the judges were to rate 
each trait on a continuum of social desir- 
ability. The median rating for each trait 
served as its social-desirability scale value. 
When these traits are ordered from most so- 
cially desirable to least socially desirable, 
the following rank order is obtained: respect- 
ful, energetic, orderly, refined, ambitious, 
trusting, spontaneous, precise, determined, 
obedient, economical, ardent, persistent, de- 
liberate, cautious, conventional, leisurely, dar- 
ing, sentimental, poetic, emotional, wary, 
dominant, meek, and lusty. 

Questionnaire endorsement. Each of the 25 
trait terms were written in the form of a posi- 
tive assertion, such as, “I am respectful,” “I 
am energetic,” and so on. Another independ- 
ent sample of 65 university students, com- 
posed of 31 females and 34 males, answered 
these personality questionnaire items as to 
whether or not they were characteristic of 
themselves. As in Edwards’ study (3), the 
proportion of subjects stating that each trait 
was characteristic of themselves was com- 
puted, and these proportions served as the 
self-endorsement values for the respective 
traits. 

Rating scale descriptions. A third inde- 
pendent sample of 58 university students (28 
females and 30 males) rated themselves on 
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Social-Desirability Correlates on Questionnaire 
and Rating Scales 











Question- Real Ideal 

Variable naire Self Self 
Social Desirability .82** .81** = 
Real Self .63** 
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the 25 traits, once for real self and once for 
ideal self. As in Zimmer’s study (8, p. 447), 
the real-self scale went from “7. I am a very 

. . person” to “7. I am definitely not a 

. . person,” and the ideal-self scale went 
from “7. I would like to be a very . . . per- 
son” to “1. I definitely don’t want to be a 

. . person.” For the respective measures of 
real self and ideal self, the median rating 
scale value for each item served as its prob- 
ability of endorsement value. 

Q sorts. A fourth and small independent 
sample of eight university students, equally 
divided between the sexes, performed three Q 
sorts on the traits. The traits were first sorted 
for real self, then ideal self, and thirdly for 
how the subjects thought the traits arranged 
themselves in terms of social desirability. 
The following quasi-normal frequency dis- 
tribution was used: From 0 (least character- 
istic) to 6 (most characteristic) the seven 
steps required frequencies of 2, 3, 4, 7, 4, 3, 
and 2, respectively. 

While it is usually considered desirable to 
have a larger number of categories and a 
larger sample of trait statements for Q sorts, 
the main interest was to demonstrate the pos- 
sible operation of social desirability on Q 
sorts and not in conducting a Q study per se. 


Results 


In order to show the relationship between 
social desirability and the probability of en- 
dorsement on the questionnaire and rating 
scales, correlations were determined between 
the rank order of the traits on the social- 
desirability continuum and those based on 
probability of endorsement. The resulting 
rank-order correlations are shown in Table 1. 
The correlation of .82 between desirability 
and probability of endorsement confirms Ed- 
wards’ (3) original finding of a correlation 
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of .87 between probability of endorsement 
of 140 personality items based on Murray’s 
needs and social-desirability scale values. 
Further, Hanley (5) has obtained similar re- 
sults, reporting correlations of .89 and .81 
between desirability rating and probability 
of endorsement of a random sample of items 
from the Sc and D scales of the MMPI, re- 
spectively. 

Table 1 indicates also that the social-de- 
sirability variable intrudes into the real-self 
and ideal-self rating scales to a similar mag- 
nitude as it does into questionnaire items. 
The correlation of .63 between the rank or- 
ders of the real self and ideal self is un- 
doubtedly a partial reflection of social de- 
sirability. Thus, the results indicate no sup- 
port for the hypothesis that social desirability 
plays a differential role in these two person- 
ality-evaluation techniques, at least for the 
items employed in this study. 

The relationships existing between the vari- 
ous Q sorts are given in Table 2. The values 
presented under the real self vs. social desir- 
ability columns represent the correlations be- 
tween each individual’s real-self sort and his 
social-desirability sort and between each in- 
dividual’s real-self sort and the redistribution 
of the 67 judges’ median social-desirability 
values of the traits to fit the forced intervals 
used in the Q sorts. A similar analysis, for 
ideal self, is expressed in the last two columns 
of Table 2. The correlations between social 
desirability and the real-self and ideal-self 
sorts are highlighted in this table. Thus the 
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Table 3 


Composite Q-Sort Relationships 





Real Ideal 


Self Self 
Social Desirability .66** .59** 


Real Self .82** 


data support the Edwards and Horst (4) 
formulation that social desirability should 
enter in Q sorts. It is of some interest to 
note that the real self and ideal self correlate 
significantly with the independent group con- 
ception of social desirability in every case 
but two. 

To assess the influence of social desirability 
on the Q sorts taken as a whole, the eight 
scores for each trait item were averaged to 
form an averaged score value. On the basis 
of their averaged score values, the 25 traits 
were redistributed into the categories of the 
quasi-normal frequency distribution. This pro- 
cedure was followed separately for the real- 
self, ideal-self and social-desirability sorts. 
Table 3 reports the intercorrelations between 
these averaged self-assessments. Again the 
operation of social desirability on Q sorts is 
confirmed. 


Discussion 


A major implication of the present findings 
is that social desirability operates in a sig- 
nificant fashion in all three personality-evalu- 
ation techniques. No evidence was found to 
support the initial expectancy that social de- 
sirability would play a differential role in 
the three procedures. 

The results also have practical implica- 
tions for investigators who test the twofold 
clinical hypothesis that maladjustment is in- 
dicated by a discrepancy between real self 
and ideal self and that the correlation be- 
tween the two selves should decrease as ther- 
apy proceeds. Unless the social-desirability 
variable is controlled, the specific variance 
in the difference score between real self and 
ideal self will be negligible because social de- 
sirability will cancel out any reliable differ- 
ence between the two selves. That is, the 
“inflated” correlation between real self and 
ideal self has an important influence on the 
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specific and error variance measured by any 
computed discrepancy (difference) score. The 
net effect of a high correlation between the 
two selves is to reduce specific variance and 
to increase error variance in the discrepancy 
score. Hence, it would be fruitless to investi- 
gate whether or not a discrepancy index is 
related to personality maladjustment if social 
desirability produced a high correlation be- 
tween real self and ideal self. Furthermore, 
the error variance in the difference score will 
be large if the respective reliabilities of the 
real-self and ideal-self measures are low. 

It is obvious that the specific variance of 
the discrepancy score may be increased by 
having a large number of traits rated and by 
controlling the influence of the social-desir- 
ability variable. The former procedure has, 
of course, its exceptions. But, how might one 
control for the latter variable? For struc- 
tured Q samples, Edwards and Horst (4) 
suggest that it can be controlled by having 
for each of the two or more kinds of items a 
balancing in terms of social-desirability scale 
values. For instance, if the structured sample 
is composed of only surgency and desurgency 
items, they would have for every item in the 
surgency group one in the desurgency group 
matched for social desirability. However, they 
make no suggestions for unstructured Q sam- 
ples or for rating scale items, such as used in 
this study. For unstructured samples, social 
desirability could be partially controlled by 
utilizing traits which are ambiguous in terms 
of social desirability. Ambiguity could be de- 
fined in terms of the standard deviations of 
the social-desirability ratings. Another pro- 
cedure for controlling social desirability in 
unstructured samples would be to use only 
neutral trait items, that is, those items near 
the median of judged desirability. 

Finally, it should be noted that the present 
data cast doubt upon Zimmer’s negative find- 
ings on testing the hypothesis that a real- 
ideal discrepancy indicates conflict. It will be 
recalled that the present study employed the 
same traits as those used by Zimmer. Be- 
cause the present findings indicate that these 
traits are “loaded” with social desirability, 
stereotypic responses to them undoubtedly 
occurred in Zimmer’s study. As a result, little 
personal involvement may have entered into 
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his subjects’ judgments. In short, unequivocal 
evidence against the discrepancy hypothesis 
requires rigorous control over the social-de- 
sirability variable. 


Summary 


This study was designed to indicate the 
necessity for controlling the social-desirabil- 
ity factor in testing the twofold clinical hy- 
pothesis regarding discrepancy between real 
self and ideal self and to assess the relative 
influence of social desirability on personality 
questionnaires, rating scales, and Q sorts. The 
findings support the conclusion that social de- 
sirability enters into the three personality 
techniques to about the same extent. The re- 
sults further suggest that Zimmer’s conclu- 
sion that discrepancies between real self and 
ideal self are not indicative of conflict is open 
to question because of his failure to control 
for the social-desirability factor. 

It was emphasized that the control of so- 
cial desirability is an indispensable aspect of 
any study testing the clinical hypothesis 
about real-ideal self discrepancy. Procedures 
for increasing the specific variance of discrep- 
ancy indices and for controlling the social- 
desirability variable were discussed. 
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The Evaluation of the Significance of Differences 
Between Scaled Scores on the WAIS: 
The Perpetuation of a Fallacy 


H. Gwynne Jones 


Institute of Psychiatry (Maudsley Hospital), University of London 


In the WAIS manual, Wechsler claims that 

. about two-thirds of the observed dif- 
ferences between scores on any two tests do 
not exceed three scaled score points. Differ- 
ences as large as five points may be unusual 
enough to be noteworthy; smaller differences 
should be recognized as more common, there- 
fore less likely to be significant” (3, p. 18). 
In making this statement Wechsler is inviting 
the clinical psychologist using the test to con- 
clude that such discrepancies, bei~.. are, are 
indicative of pathology, a practi already 
common, largely as a result of Wechsler’s 
earlier writings (2). In fact, differences of 
this order of magnitude are very common, 
even in the standardization sample, as is evi- 
dent from Wechsler’s own argument. 

There are 55 possible pairings for the eleven 
tests in the scale. Wechsler examined the cor- 
responding 55 distributions of differences for 
the 25-34 age group, found that the median 
value of the standard deviations was approxi- 
mately three, and deduced the proposition 
quoted above. His choice of a difference of 
five points as a critical level represents an 
SD value of approximately 1.7 and, there- 
fore, significance at the 9 per cent level. This 
percentage does not, however, refer to the 
proportion of individual subjects displaying 
such a discrepancy but to the proportion of 
differences of which 55 occur for each sub- 
ject. Therefore, on Wechsler’s reasoning, some 
four to five discrepancies of this magnitude 
are to be expected in any one record. 

Wechsler’s statement is, of course, more 
apt if one is concerned, not with a difference 
between any two tests, but with a difference 
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between two specified tests as such specific 
pairings occur only once in each record. Such 
usage, however, is only justified when there 
are prior grounds for selecting the specific 
difference examined. Thus, for example, if it 
is predicted on independent grounds that a 
certain subject will achieve an abnormally 
low score on Block Designs as compared to 
Vocabulary then Wechsler’s method may be 
applied though, of course, the direction of 
the difference is then of importance and a 
one-tailed test may be applied. In such cir- 
cumstances, however, it would be unwise to 
make use of Wechsler’s approximate average 
standard deviation as, owing to the wide 
range of correlation between different tests, 
the magnitude of the standard deviation 
varies considerably. Thus, for example, the 
correlation of .81 between Vocabulary and In- 
formation yields a standard deviation of dif- 
ferences below two scaled score points while 
the correlation of .30 between Object As- 
sembly and Digit Span raises the value to 
over 3.5. 

As the writer has only recently received 
the WAIS test material, empirical data de- 
rived from that test cannot be quoted in sup- 
port of the arguments advanced in this paper. 
Similar reasoning can, however, be applied to 
test results derived from the earlier Wechsler- 
Bellevue Scale (2). The scaled scores of 32 
Kansas Highway Patrolmen, rated as well 
adjusted, are reported by Rapaport (1, p. 
521). If examined for subtest discrepancies, 
these data show at least one difference of at 
least five scaled score points for all but one 
subject. In all 32 records there are 251 dif- 
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ferences of this magnitude, representing some 
14 per cent of the 1,760 differences involved. 
Sixty-one differences, or 3.5 per cent of the 
total, are of seven or more scaled score points, 
and eight differences reach ten points or more. 
When the 110 possible specific, directional 
discrepancies are examined, 56 of these fail 
to reach six scaled score points in any record. 
Of the 54 which achieve this magnitude or 
greater, 17 occur once, 19 twice, 7 three 
times, 7 four times, and 4 five times. 
Wechsler promises to discuss fully the rela- 
tion of his findings concerning subtest differ- 
ences to problems of patterning in a forth- 
coming book. It is to be hoped that he will 
clarify the issues discussed in this paper. If 
the wide range of random scatter found in 
normal protocols were fully realized much of 
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the effort now expended in research on scat- 
ter analysis would be more fruitfully applied. 
Though group patterns may be established, 
the range of individual deviations will be so 
great as to render the procedure useless for 
individual application by clinical psycholo- 


gists. 
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TEST 


Edwards, Allen L. Edwards Personal Preference 


Schedule (PPS). New York: Psychological Corp., 
1953, 1954. 


Reviewed by 


J. Richard Wittenborn 


Rutgers University 


The Personal Preference Schedule, a thoughtfully 
constructed instrument, embodies merits expressive 
of contemporary psychology. Although the author 
emphasizes the value of his schedule as an instru- 
ment for research, it comprises features which may 
prove to be of great merit in practical assessment. 

Based on Murray’s system of manifest needs, it 
provides scores for: 1. Achievement, 2. Deference, 
3. Order, 4. Exhibition, 5. Autonomy, 6. Affiliation, 
7. Intraception, 8. Succorance, 9. Dominance, 10. 
Abasement, 11. Nurturance, 12. Change, 13. Endur- 
ance, 14. Heterosexuality, and 15. Aggression. The 
need scores emerge from a procedure whereby the 
subject indicates his preference in each of a series of 
225 pairs of experiences, activities, or situations. 
These pairs are provided by a matching of each of 
15 statements (expressive of one of the needs) with 
every other one. The selection of the statements to 
be paired was guided by consideration of the social 
desirability of the statements. It was hoped that 
having the members of the pairs equal with respect 
to social desirability would reduce the probability 
of the need scores being confounded with their so- 
cial desirability. 

Although the paired-comparison structure of this 
instrument may control social desirability and re- 
duce the effect of other irrelevant or confounding 
needs and biases, it may involve possible disadvan- 
tages. For example, the subject must express a pref- 
erence for one of two modes of response both of 
which may be equally disliked and, in addition, the 
intensity of the need may remain obscured. It is 
therefore possible that a given statement of mani- 
fest need which is consistently highest in this set of 
15 may not represent a truly intense need for the 
individual. Perhaps, however, it is too much to ex- 
pect an instrument which shows the relative in- 
tensity of needs also to show their absolute intensity. 

Both split-half and retest reliabilities are given for 
the various scores. The corrected split-half correla- 
tions range from .60 to .£87 with a median of .77, 
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and the retest correlations range from .74 to .88 with 
a median of .78. Norms, provided for both men and 
women college students, are expressed in terms of T 
scores as well as percentile ranks. The 1509 cases 
comprising the standardizing group were drawn from 
many different colleges broadly representative of geo- 
graphical areas and types of institutions. 

An interesting feature of the PPS is the fact that 
it yields scores which are relatively independent de- 
spite the fact that they are relatively reliable. The 
highest subtest intercorrelation, 46, is between Nur- 
turance and Affiliation, certainly no more of a rela- 
tionship than would be anticipated from the simi- 
larity of the constructs represented. These two 
variables, possibly with a third variable, Abasement, 
seem to bear a small but systematic, negative rela- 
tionship with such variables as achievement, au- 
tonomy, and dominance. These and other possible 
consistencies which may be inferred among the in- 
tercorrelations are plausible and do not detract from 
one’s confidence that the variables may be usefully 
expressive of the need achievement system from 
which they draw their labels. 

The Personal Preference Schedule is a modern in- 
strument in the particular respects that it is de- 
signed to measure personality constructs which have 
a conceptual origin independent of the instrument 
itself; the subtests are not only reliable but rela- 
tively independent so that they are free from a con- 
fusing and uneconomical overlapping implication; 
and there is a built-in estimate of the reliability of 
the individual’s performance in the form of a self- 
consistency score. The user also enjoys substantial 
protection from the possibility that the manifest 
need scores could be confounded to an important 
degree by the need to appear socially desirable. The 
possibility of personality schedule scores being con- 
founded with a tendency to present a socially desir- 
able facade is often considered to be a fatal defect 
among such instruments. Edwards not only mini- 
mizes this possibility by pairing the statements on 
the basis of their predetermined social desirability, 
but he scrupulously examines the degree to which 
he has succeeded in doing this by the use of other 
scales and inventories. Incidentally, it is possibly the 
elimination of the confounding with social desir- 
ability which permits the relatively low intercorre- 
lations between the scales. 

When one thinks of the validity of an instrument 
traditionally, one thinks of the degree to which its 
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scores are related to some explicit, practical cri- 
terion for some definite sample. Such relationships, 
however, are but an expression of a possible conse- 
quence of validity; they are not the basis for va- 
lidity. A certain type of empirical validity may be 
relatively easy to show for most aptitude or achieve- 
ment tests. Most such tests comprise a set of items 
or performances which is in effect a sample from 
a hypothetical population of performances, and the 
sample of performances comprising the criterion may 
also be conceived as drawn from the same hypotheti- 
cal population of performances. This “sampling” ap- 
proach, which characterizes most of our testing ef- 
forts, should require no more than a reliable test 
sample of responses and reliable criterion sample of 
responses in order for empirical validity to be dem- 
onstrable. Edwards’ instrument should not be con- 
fused with this tradition in testing. The Personal 
Preference Schedule is intended to provide a reli- 
able and convenient means for determining the rela- 
tive intensity of 15 fairly well-known and previ- 
ously explored motivational systems. It would be 
very desirable to show relationships between the 
various manifest need scales provided by the PPS 
and other measures for these needs, or manifesta- 
tions of behavior relevant to them. Nevertheless, it 
is not necessarily incumbent upon the author to 
provide such relationships in order to claim that he 
has created a Personal Preference Schedule based 
on the need system. 

The Personal Preference scales have been corre- 
lated with other scales and inventories and the re- 
sults do not detract from the hypothesis that the 
scales do distinguish between individuals in a man- 
ner appropriate to the constructs of need which pro- 
vide the labels. “Validity” in the pragmatic sense 
cannot yet be claimed for this instrument, however. 

Apparently, there are numerous studies under way 
which employ this instrument and we shall learn 
about the possible ways in which Edwards’ scales 
for manifest needs are related to a wide variety of 
behaviors. In a recent communication with the au- 
thor, it was learned that the published literature 
now includes 5 references to the Personal Preference 
Schedule and that 32 additional inquiries which are 
not yet reported involve its use. It would appear, 
therefore, that the prospects for a psychologically 
relevant, conceptual validation of this instrument, 
and of the need constructs on which it is based, are 
very good. A useful construct of the behaviors 
validly expressed by a test may eventually have 
much more practical value than a psychologically 
indefinable, conglomerate test which is shown to be 
empirically valid for some one specific criterion of 
useful behavior. 

At the present time this reviewer finds only two 
aspects of the total procedure which may require 
some immediate attention. First, the relative homo- 
geneity or heterogeneity of the scales with respect 
to the intercorrelations among their component items 
has not been examined; this could be important. 
Second, the answer sheet, although faultlessly de- 
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signed for the rapid scoring and collating of results, 
is a little tedious for the person who is taking the 
test. From the standpoint of scoring and interpreta- 
tion, it would be very difficult indeed to conceive of 
a more convenient and agreeable procedure. The 
schedule is virtually self-scoring and the profile on 
the reverse of the answer sheet is self-interpreting. 


Edwards, Allen L. Edwards Personal Preference 
Schedule (PPS). New York: Psychological Corp., 
1953, 1954. 


Reviewed by 


John W. Gustad 
University of Maryland 


In developing the Personal Preference Schedule, 
Edwards was seeking “...to provide quick and 
convenient measures of a number of relatively in- 
dependent normal personality variables.” Further, 
the instrument was intended to be useful primarily 
in counseling and in research. Both objectives are 
highly desirable. Far too little attention has been 
paid to the development of measures relevant to the 
normal range of behavior. 

The 15 scales of the PPS ostensibly measure the 
manifest needs proposed by Murray. There is an ad- 
ditional score, a measure of consistency, which was 
developed to provide information relative to the 
subject’s test-taking behavior. In addition to its ap- 
plications to counseling and research, Edwards sug- 
gests that the schedule will be useful for classroom 
demonstrations, in selection problems, and in in- 
verted factor analysis. If its usefulness for these pur- 
poses can be demonstrated, the PPS will be a signifi- 
cant addition to the psychologist’s armamentarium. 

The coincidence of the dates of publication of 
this instrument and of the APA’s Technical Recom- 
mendations for Psychological Tests and Diagnostic 
Techniques suggested a preliminary method for as- 
sessing its adherence to good practice in develop- 
ment. A check list was developed containing all 
items in the APA pamphlet which were both rele- 
vant to this kind of instrument and which were 
marked essential. The test manual was then com- 
pared with these points, and a decision was reached 
as to whether the PPS conformed to the essential 
recommendations. In 54 out of 60 cases it did con- 
form, giving it a “score” of 90 per cent. On the face 
of it, this would seem to be a favorable sign. Un- 
fortunately, the “score” did not tell the whole story. 
The percentage of “hits” varied from 100 on recom- 
mendations regarding administration to 16 on va- 
lidity. 

The major sections of the manual provide a help- 
ful framework for discussion. The section on de- 
velopment is given over largely to a discussion of 
the method followed for reducing the effects on re- 
sponses of the factor of social desirability. Evidence 
is cited later, in the section on validity, which indi- 
cates that this attempt was largely successful. Miss- 
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ing from the section on development is information 
as to the sources of items and the methods of item 
selection. The reader is told that each of Murray’s 
15 needs is paired with each other twice, yielding a 
total of 225 items. It would be interesting to know 
how the items were selected and built. 

Directions for scoring and administration are fairly 
simple and straightforward. The PPS is untimed and 
requires about 40 minutes’ working time for most 
college students for whom it was built. It is not 
quite self-administering, but chances for error seem 
small. Scoring is simple and can be done by clerks. 
Only when the omission of items requires prorating 
is there any likelihood of difficulty, and this hazard 
is small; Raw scores may be easily plotted on a 
profile based on T scores; percentile equivalents are 
also available. 

Norms are based on 1509 cases—760 men, 749 
women—all students drawn from a number of arts 
colleges. No information is given regarding the con- 
ditions under which the students took the PPS. 
There is a mean difference of 1.69 years between 
the sexes in favor of the men but separate norms 
for the sexes are provided. 

Users of the PPS are advised to determine for 
themselves the limits for high and low scores. Ed- 
wards provides tentative cutting points with T scores 
of 70 and above being high, 30 and below being low, 
the intermediate points being 10-point T-score ranges. 

As mentioned, the validity check or measure of 
test-taking attitude is provided by the consistency 
score, a measure “. . . based on a comparison of 
the number of identical choices made in two sets of 
the same 15 items” (Manual, p. 6). Edwards shows 
that the distribution of consistency scores departs 
from that based on the expanded binomial and con- 
cludes that it is effective. He suggests a consistency 
score of 9 or less be considered grounds for ques- 
tioning the subject’s test-taking attitude. Further 
evidence on the same problem is based on coeffi- 
cients of profile stability. Ninety-three per cent of 
these exceeded chance in the standardization group. 

Two kinds of reliability estimates are provided: 
coefficients of internal consistency (split-half) and 
of stability (test-retest). The former are reported 
after correction by the Spearman-Brown formula; 
the uncorrected coefficients are not reported. The 
stability coefficients are based on a sample of 89 
University -of Washington students, a surprisingly 
small group for such a purpose. The interval be- 
tween tests was one week. The internal consistency 
coefficients range from .60 for Deference to .87 for 
Heterosexuality ; the stability coefficients range from 
.74 for Achievement to .£88 for Abasement, certainly 
as high as those reported for most similar tests. 
Whether they should be higher is a question im- 
bedded in reliability theory as it applies to person- 
ality measures. There is a question about the use- 
fulness of stability coefficients based on a week’s 
interval; it is to be hoped that further work will in- 
clude longer time intervals and a larger sample of 
subjects. 
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Intercorrelations among the 15 scales are presented. 
Of the 105 entries, 96 are statistically significant. 
The majority are negative. The highest intercorrela- 
tion reported is 46; most are less than .20. It would 
appear that, by and large, Edwards had succeeded 
in producing an instrument with relatively independ- 
ent scales. Since construct validity is involved, a 
factor analysis would certainly seem to be a requisite 
next step. It should have been done and reported 
in the manual. 

For all tests, validity is the ultimate question. 
Here the PPS is most deficient, at least insofar as it 
is possible to tell from the published manual. A list 
of research projects in process is included, but, in 
the absence of information about the results ob- 
tained, this list is of limited interest. Unless satis- 
factory validity can be demonstrated, all other fea- 
tures of a test tend to be meaningless. 

The APA Technical Recommendations describes 
four kinds of validity: content validity, predictive 
validity, concurrent validity, and construct validity. 
The PPS manual deals only with the last of these. 
As indicated above, Edwards was seeking to develop 
measures for the needs described by Murray. It 
would seem necessary, therefore, that he demon- 
strate that his scales do in fact measure these needs. 
No such evidence is presented. 

Admittedly, construct validity is difficult to estab- 
lish but the difficulty in no way reduces the neces- 
sity for establishing it. The recommended approaches 
to construct validation are contained in the APA 
pamphlet as follows: 

“Essentially, in studies of construct validity, we 
are validating the theory underlying the test. The 
validation procedure involves two steps. First, the 
investigator inquires: From this theory, what pre- 
dictions would we make regarding variations of 
scores from person to person or occasion to occa- 
sion? Second, he gathers data to confirm these pre- 
dictions” (Technical Recommendations, p. 14). It is 
also suggested that factor analysis is one method for 
establishing construct validity. 

One part of the section on validity contains re- 
ports of studies based on self-ratings by subjects 
Although correlational situations are implied, results 
are presented verbally as follows: “The self-rankings 
of some subjects agreed perfectly with their rank- 
ings on the PPS” (p. 13). Unless one paid proper 
attention to the word “some,” he could infer a rho 
of 1.0. What he might conclude after noting the use 
of the word “some” remains in doubt. Edwards con- 
cludes this section of the manual with the following 
statement: “It is not clear, however, how even per- 
fect agreement between self-rating and inventory 
scores could be interpreted as bearing upon the na- 
ture of the variable being measured by the inven- 
tory” (p. 13). In view of the lack of clarity, the 
omission of this section might have been desirable 

The principal line of validation rests on correlating 
PPS scores with scores on the Guilford-Martin and 
the Taylor Manifest Anxiety Scale. The Cooperative- 
ness, Agreeableness, and Objectivity scales of the 
former were used, based on a total of 106 cases. 
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With the Guilford-Martin, the PPS scales correlated 
significantly in 4 cases with Cooperativeness, 11 
cases with Agreeableness, and 2 cases with Objec- 
tivity. There were only 2 significant correlations 
with the Taylor. One conclusion which might be 
drawn from the above is that the PPS measures 
Agreeableness or lack thereof. 

The comments above suggest a dilemma. Either 
Edwards has based his claims for the validity of the 
PPS on construct validity or on some other kind. 
Since the manual contains nothing to support claims 
of any other kind of validity, we must consider his 
efforts in regard to construct validity. He has cor- 
related his scales with four others, drawn from the 
Guilford-Martin and the Taylor. The significant cor- 
relations are not interpreted to bear on the construct 
validity of the PPS. That is to say, no reasons are 
given to support the notion that the PPS does in 
fact measure the manifest needs proposed by Mur- 
ray. It would seem, therefore, that the only conclu- 
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sion to be drawn is that no usable information is 
presented regarding the validity of the PPS. 

It is the responsibility of the test author and the 
test publisher to establish the validity of any in- 
strument made available commercially for use in 
one or several practical situations. In the case of this 
instrument, both the author and the publisher seem 
to have fallen short of meeting their responsibilities 
in this respect. One single, simple step might have 
been—and still might be—taken to correct this. 
Across. the front of each test and each manual, there 
should be stamped, in large, red letters (preferably 
letters which will glow in the stygian darkness of the 
personality measurement field) the word EXPERI- 
MENTAL, 

It is experimental. It is an intriguing, promising, 
in many ways very carefully conducted experiment, 
but it is still an experiment. Until its validity has 
been established, it must remain an experiment, and 
it should not be released for any other purpose. 
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